punt 
Volume RQC-9 December, 1960 ey OF \*" Number 3 
, oN 


\ isle 


TABLE OF DINE eaonieh- 


PAGE 
Military System Reliability: Department of 
Dee UNIONS oc pcre es ae seeds os Tose vives J. Spiegel and E. M. Bennett 1 
Breaking Even on Failure Rate Reduction... J. B. Heyne 9 
ePID IEIOSTIG E FOCEOUEOS 22 erases afer voces ese nee tnansnieas teen netene B. B. Winter 13 
New Autopsy Techniques for Transistors 
afi) ES 7 Se ee ee C. B. Clark and E. F. Duffek 20 
Orn Prediction of System Behavior... 2.0.0.0... scien lom ees Joan R. Rosenblatt 23 
An Application of the Information Theory 
/vayorie eS 1 FIN We, IO [os [101 Sanne ct ne E. J. Kletsky 29 
Back Issues of Reliability Proceedings Now Available... tte 40 


TK 1&0 
oot, kG 


Fok - 


Bir S aay nt npn Stain 
re ni | ale 


LIABILITY AND QUALITY CONTROL 


up on Reliability and Quality Control is an organization, — 
ae ns patie ae aie 
ty Convo, AV Sales ee 


: ee Annual Fee: $3.00 
-P. K. McElroy 
Vice Chairmen ss ae 
J. C. McAdam L. J. Paddison C. M. Ryerson 
Secretary . Treasurer 
R. F. Rollman H. J. Stryker 
Editor 
W. X. Lamb, Jr. 

H. Cary =) PAKS McElroy J. R. Somerville 

J. W. Greer L. L. Schneider RL Vander Hamm 
D. A. Hill M. P. Smith | Victor Wouk 


IRE TRANSACTIONS ® 
| on Reliability and Quality Control 


Published by the Institute of Radio Engineers, Inc., for the Professional Group 
on Reliability and Quality Control, 1 East 79 Street, New York 21, N. Y. 
Responsibility for the contents rests upon the authors, and not upon the IRE, the 
Group or its members. Individual copies of this issue and all available back issues 
may be purchased at the following prices: IRE members (oe copy) $2.25, 
libraries and colleges $3.25, all others $4.50. 


Copyright © 1960 — THE INSTITUTE OF RADIO ENGINEERS, INC. 
All rights, including translation, are reserved by the IRE. Requests for republication 
Na York cl NGWaghat iy eas TET ak os 
New York 21, N. Y. 


PRINTED IN U.S.A. 


Military System Reliability: 


Department of Defense Contributions 


J. SPIEGELt AND E.M. BENNETT t 


Summary—This report describes the Defense 
Jepartment’s increasing concern regarding elec- 
ronic equipment reliability during the period 
942-1959. It discusses the establishment of the 
dint Army-Navy Vacuum Tube Development Com- 
nittee (VTDC) in June, 1943, and VTDC’s suc- 
essor, the Panel on Electron Tubes (PET) in 
Jctober, 1946. Also described is the formation of 
he Ad Hoc Group on Reliability of Electronic 
-quipment in December, 1950, the Advisory Group 
In Reliability of Electronic Equipment (AGREE) in 
\ugust, 1952, the Advisory Group on Electron 
Tubes (AGET) in March, 1954, the Advisory Group 
yn Electron Parts (AGEP) in June, 1954, and the 
\d Hoc Committee for Guided Missile Reliability 
ACGMR) in March, 1957. The interrelation of the 
asks and findings of these organizations are dis- 
sussed, 


INTRODUCTION 


The achievement of adequate reliability in any 
-omplex system involves appreciable technical 
und managerial innovation along with particularly 
1igh standards of production and use. This is 
yarticularly the case when the system is a mili- 
ary one and when it is also highly dependent upon 
“lectronics. Therefore, it is not surprising to 
1ote that the Department of Defense has played a 
vital role in determining the direction of a variety 
of reliability efforts, especially in the areas of 
military electronics. 


1942— 1949 


The modern quest of the Department of Defense 
for higher reliability in military electronic sys- 
fems and equipment started, for all practical pur- 
y0ses, with the onset of World War 1.1 


‘The Mitre Corp., 244 Wood St., Lexington 73, Mass. 

In this review we have limited our attention to one ap- 
arently significant flow of historical forces relevant to 
nilitary system reliability. We have not discussed the 
nultiple contributions either of allied agencies such as 
nservice military laboratories, of allied fields such as 


Prior to that war, military electronic equipment 
was relatively simple, and, since simple electronic 
devices are more reliable than complex ones, the 
problems attendant upon unreliability were not 
greatly stressed. However, concomitant with the 
onset of our preparations for war came a tremen- 
dous demand from the military services for a host 
of what were then seen as relatively complex elec- 
tronic equipments. New military demands required 
developmental efforts for new tubes, new circuitry, 
new applications of components, and an increased 
ability for the equipment to withstand higher levels 
of environmental stress. 

Since numbers of organizations were interested 
in developing new tubes and novel applications of 
tubes, it was realized by many that, without con- 
trol and coordination, millions of dollars could be 
spent in this direction without firm results to show 
for such expenditures. Recognizing this need, 
members of the Radiation Laboratory of the Massa- 
chusetts Institute of Technology proposed that a 
committee be formed to coordinate all such tube 
developmental efforts. 

On November 4, 1942, a Radiation Laboratory 
report was submitted to the Navy Bureau of Ships. 
First, noting that the complexity of the vacuum 
tube is such that it must be considered a separate 
and distinct piece of apparatus, the report sug- 
gested that “*. . . large and continuing demands for 
new applications, together with relatively short 
life and coincident problems of supply, create an 
unusual volume of problems demanding the exist- 
ence of a group whose responsibility is confined to 
vacuum tube coordination.” Further, *.. . al- 
though several of these problems might be referred 
to existing bodies, none has the authority to control 
or coordinate the whole subject in any given in- 
stance nor are these examples inclusive of the 
scope of the total problem. In the United States 
there are industrial, Army, Navy, and university 


quality control statistics, of allied industrial and engi- 
neering firms, or of professional groups and organi za- 
tions. These have not been neglected, howevei, for any 
search of the literature in the field of system reliability 
will quickly demonstrate that many advances are a di- 
rect result of the efforts of such groups. 


2 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


vacuum tube laboratories. Within the limits of in- 
formality and lack of authority there is a degree 
of cooperation among these various institutions, 
but, without direction, duplication, conflict and 
misguided effort must of necessity occur.” 

The report went on to detail its specific recom- 
mendations for the formation of this proposed 
committee. It suggested that the committee be 
organized as an executive body of the Joint Com- 
munications Board (JCB) and that its responsi- 
bilities should be broad enough to fully coordinate 
and encourage tube development work. 

The Bureau of Ships agreed in principle with 
the establishment of a coordinating body and, on 
January 18, 1943, initially recommended that it be 
made part of the Radiation Laboratory. However, 
additional consultations between the Chief of Engi- 
neering and Technical Service of the Office of the 
Chief Signal Officer and the Head of the Radio Di- 
vision of the Bureau of Ships resulted in a pro- 
posal for a joint Army-Navy Vacuum Tube De- 
velopment Committee to be formed by the Office 
of Scientific Research and Development (OSRD). 
This final proposal was reviewed by the Joint 
Committee on New Weapons and Equipment of the 
Joint Chiefs of Staff and, as a result, the Chair- 
man of the National Defense Research Committee 
(NDRC) was asked to form such a committee. 

On June 7, 1943, the first meeting was held, 
with Dr. I. I. Rabi as Chairman. Dr. Vannevar 
Bush, then Director of OSRD, attended the first 
meeting and expressed his feeling that the initial 
formation of the Vacuum Tube Development Com- 
mittee (VTDC) was of an experimental nature and 
that in the future, if events warranted, it might be 
taken from NDRC and placed under joint Army, 
Navy and OSRD sponsorship. 

On August 23, 1943, the Director of the Vacuum 
Tube Development Committee issued the first of 
what proved to be a series of publications on 
VTDC history, directives and information. Essen- 
tially, these publications were a review of the his- 
tory of the VTDC and provided information as to 
the membership of the committee. Included in 
these reports were restatements of the VTDC 
responsibilities as well as a detailing of the meth- 
ods by which the VTDC would live up to them. 

Some of the responsibilities were: 

1) to keep itself informed regarding vacuum 
tube research and development; 

2) to formulate plans and make recommen- 
dations regarding specific vacuum tube 
research and development programs; 


2The full text of the specific recommendations may be 
obtained from the authors. 


December 


3) to consider problems of vacuum tube re- 
search and development suggested by any 
of its members or by liaison representa- 
tives to the committee; 

4) to investigate and designate operating 
conditions for new tubes and to specify 
tests to determine the suitability for new 
types of service use; 

5) to recommend to the JAN-1 Specifications 
Committee additions to the joint Army- 
Navy preferred list of vacuum tubes with 
preliminary specification data on newly 
developed tubes; 

6) to recommend appropriate action to the 
Joint Vacuum Tube Control Committee 
regarding the procurement and assign- 
ment of priorities and precedents for the 
procurement of developmental tubes; and 

7) to pass information from one commercial 
concern to another only with the consent 
of the originator. 

With the VTDC in need of technical facilities 
and personnel in order to realize its responsibili- 
ties, a contract was established between OSRD and 
Columbia University, under which the Vacuum 
Tube Development Group was organized, at Colum- 
bia, to serve as a Secretariat for the full com- 
mittee. 

Throughout the war, both the Vacuum Tube De- 
velopment Committee and Group were successful 
in coordinating efforts in the development of 
vacuum tubes. A survey of all the vacuum tube 
research and development efforts going on in the 
United States was started. Project lists and tech- 
nical information were published regularly. These 
accumulated contributions during the war years ~ 
were So well received that desires were expressed 
by various persons and organizations to have the 
Vacuum Tube Development Committee continue 
after the war. Fears, however, were expressed 
that cuts in NDRC funds might cause the VTDC’s ~ 
premature demise. f 

As a result of these fears, proposals were mag 
in 1944 for the Committee to become sponsored i 
jointly by the Army and Navy to insure that, when 
the war in Europe ended, enough financial support 
would be available to guarantee that the work of 
the Committee could continue. This proposal was 
taken under advisement and at a meeting held at 
the Pentagon, on January 5, 1945, it was agreed © 
that the Signal Corps of the United States Army _ 
would integrate the Vacuum Tube Development 
Committee as part of its Joint Communications _ 
Board, with fifty per cent of its funding supplied 
by the Navy. 

The Joint Communications Board, on March 1 


60 SPIEGEL AND BENNETT: MILITARY SYSTEM RELIABILITY 3 


45, issued a new directive for the Committee 
1ich differed from the old NDRC directive by the 
dition of an eighth requirement, “It shall have 
gnizance over and shall direct engineering, 
chnical, secretarial and other contractor serv- 
es provided by the War and Navy Departments in 
lfillment of the VTDC function.” 

Fifteen months later, when the Charter was 
ritten for the Joint Research and Development 
ard (JRDB) on June 6, 1946 and amended on 
ily 3, 1946, it was noted that specialty panels 
duld be formed to assist the Board in its activi- 
2s and functions. On August 15, 1946, the JRDB 
tablished the Committee on Electronics and in- 
uded in its directive the statement that upon ap- 
oval of the Joint Chiefs of Staff, the VTDC would 
: transferred to the Committee on Electronics. 

1 October 24, 1946, this transfer was accom- 
ished and the Panel on Electron Tubes (PET) 
as formed. 

The objective of the Panel on Electron Tubes 
as established by the JRDB as the *.. . achieve- 
ent of a well-balanced program of research and 
velopment of electron tubes with a view to the 
ng-term requirements of the War and Navy De- 
rtments. The accomplishment of this objective 
cessitates the continuing study, evaluation, im- 
ovement and allocation of electron tubes re- 
arch and development plans, programs, and 
‘oblems in the national defense effort and in re- 
tion to the available and potential store of scien- 
‘ic information, personnel and facilities... .” 
In order to pursue the stated objective, the 
ynel on Electron Tubes was directed to obtain 
formation about vacuum tube research and de- 
lopment programs within and outside of the 
1ited States. They were further required to 
alyze the information they obtained in order to 


d duplication of effort, to focus *. . . constant 
phasis upon the major problems... ,” to 
. . determine. . . serious gaps which exist. . .,” 


d to estimate the future requirements for fa- 
lities, equipment, and personnel. 

The PET pursued this program under the JRDB 
til December 21, 1948, at which time the new 
»search and Development Board, Committee on 
ectronics, issued a revised directive to the 
inel on Electron Tubes. The new directive, al- 
ough not significantly different from the preced- 

one, did reorient some of its effort. Now, at 
ot once each year, presentations of an inte- 
ated program of research and development in 
cuum tubes for military purposes were re- 
ired by the new directive. A requirement for 

f studies as to the allocation of responsibility 
r specific programs among the military staff 


was also included. In general, the broad require- 
ment of the “*. . . achievement of a well-balanced 
program of research and development of electron 
tubes. . . ” remained. 

With the establishment of the Panel on Electron 
Tubes under the Research and Development Board, 
the support contract was passed in 1949 to New 
York University, where it presently resides. 


1950—1959 


Concomitant with these changes within the or- 
ganization of the Panel on Electron Tubes, the De- 
fense Department, through the Research and De- 
velopment Board, noted that difficulties with elec- 
tronic equipments were not governed solely by a 
focus upon electron tubes and their associated cir- 
cuitry. At this time, as well, various postwar in- 
vestigations and studies were reaching the conclu- 
sion that excessive percentages of electronic gear 
were being received by their military users in un- 
usable fashion and that those equipments which did 
work were not working consistently. Considering 
this increasing evidence, the Research and De- 
velopment Board, on December 7, 1950, formed the 
Ad Hoc Group on Reliability of Electronic Equip- 
ment through its Committee on Electronics. 

The Ad Hoc Group was directed by the Com- 
mittee on Electronics to “. . . 1. Determine the 
major goals and problems of field maintenance of 
electronic equipment and direct constant emphasis 
on greater reliability and equipment designs which 
would reduce these problems. 2. Summarize the 
causes of failure in existing equipments, systems, 
and current procedures. 3. Appraise the industrial 
potential to meet military requirements in this 
field. 4. Evaluate the concepts of component design 
and systems to reduce field maintenance problems. 
5. Study existing and proposed research and de- 
velopment programs in the light of such concepts 
and make recommendations to the Committee 
thereof. 6. Recommend areas where additional or 
new research and development effort is needed.” 

To fulfill this assignment, Ad Hoc Group mem- 
bers were appointed from the three military de- 
partments, the Joint Chiefs of Staff, the Munitions 
Board, and various civilian organizations and pro- 
fessions. The Panel on Electron Tubes was in- 
cluded as an advisor. 

During this period that the Ad Hoc Group was 
active, 1950-1952, the Army and Navy undertook 
increasing numbers of studies in order to add to 
their knowledge of electronic equipment failures. 
For example, the Navy contracted for the Vitro 
Corporation to investigate component failures; 
Aeronautical Radio, Incorporated, to investigate 


4 _ IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


electronic tube failures; and the Bell Telephone 
Laboratories to study component part failures. 
The Army, through the Signal Corps, entered into 
a long-term Tube Analysis Program with Cornell 
University. The Air Force requested the RAND 
Corporation to investigate the general electronic 
reliability problem. These studies were con- 
ducted in addition to the military’s continuing in- 
service efforts. 

Twenty months after the formation of the Ad 
Hoc Group, the Chairman of the Research and De- 
velopment Board, in a letter to the Secretary of 
Defense dated August 14, 1951, noted that the Re- 
search and Development Board had established the 
Ad Hoc Group on Reliability of Electronic Equip- 
ment because of large numbers of reports of un- 
satisfactory performance of electronic equipment 
in the field. He defined the objective of the Group 
as examining the reliability program in the broad 
sense and stated that he expected that they would 
recommend measures which should result in re- 
liable performances with a minimum of mainte- 
nance. He further noted that the reliability prob- 
lem really extended far beyond the scope of the 
Research and Development Board, and, in fact, he 
felt that it required a combined effort with the 
Munitions Board, the Joint Chiefs of Staff, and the 
operating arms of the three services. According- 
ly he recommended to the Secretary of Defense 
that the Department of Defense recognize the 
broad scope of the reliability problem, that it en- 
dorse the Ad Hoc Group’s work, and that it enjoin 
all defense agencies to increase their emphasis 
on reliability factors. General George C. 
Marshall, then Secretary of Defense, issued on 
September 12, 1951, in response to this request, 
Department of Defense Directive 150.21-1, “Re- 
liability of Electronic Equipment.” 

He initially reviewed the work of the Ad Hoc 
Group and stated that “. . . reliability must be a 
prime objective in all phases of the procurement 
and use of . . . equipment.” He went on to direct 
that “. . . increased emphasis on reliability of 
military electronic equipment by all agencies of 
the Department of Defense is required.” 

Six months later, on February 18, 1952, the Ad 
Hoc Group on Reliability of Electronic Equipment 
issued their final progress report in two volumes, 
making seventeen major recommendations for 
action by various agencies. These recommenda- 
tions were that: 1) failure data reports be com- 
piled on the basis of field use and be summarized, 
evaluated, and placed in the hands of designers; 

2) tube, component, and especially system relia- 
bility programs be continued by appropriate 
groups; 3) reliability requirements be added to 


December 


military characteristics prepared by the Joint 
Communication Electronics Committee; 4) a study 
be made of maintenance minimization; 5) a study 
be made of effects of unreliable equipment; 6) re- 
liability concepts be involved in procurement, pro- 
duction, and quality control of electronic equip- 
ments; 7) the RDB reliability activity be extended 
from the initiation of military characteristics 
through its operational use and a permanent RDB 
reliability group be established; 8) educational 
activities be expanded and a reliability information 
center be maintained by the RDB to provide reli- 
ability data; 9) testing of equipment, simulating 
use in the field, be expanded; 10) analyzing and 
approving new designs with regard to easy main- 
tenance and reliable performance be established; 
11) a reliability section be put into specifications; 
12) the training of inspectors be improved; 

13) engineering supervision of installation of equip: 
ment by the material agencies and their contrac- 
tors be improved; 14) training of operators be 
improved, with reference to results of operational 
abuse of equipment; 15) maintenance problems be 
investigated to secure better preventive mainte-— 
nance, training and simpler test equipment; . 
16) reliability organizations be set up in the mili- 
tary department; and 17) classification as to de- 
gree of reliability necessary be adopted and inte- 
grated into military characteristics. © 

Six months following, as one consequence of 
these recommendations, an Advisory Group on Re- 
liability of Electronic Equipment (AGREE) was 
formed by the Department of Defense on August — 
21, 1952. 

When the Research and Development Board was 
abolished in 1953, AGREE was transferred to the 
Assistant Secretary of Defense (Research and 
Engineering) and re-established in 1954 as part of 
the Office of the Assistant Secretary of Defense 
(Applications Engineering). The final directive 
for AGREE, dated March 31, 1954, set the purpose 
of AGREE as assuring that “. . . the best availabl 
scientific, engineering, production and operation 
talent are applied to the achievement of reliabili 
in the field of military electronics. The Advisory 
Group will monitor, stimulate interest in, and ad 
vise on, reliability matters within its field of elec 
tronic equipment, design, development, procure- 
ment, production, maintenance, installation, op- 
erations and training.” 4 

At the same time, March 30, 1954, the Panel on 
Electron Tubes was redesignated the Advisory _ 
Group on Electron Tubes (AGET) with essentially 
the same charter. 

A few months later, on June 8, 1954, the Ad- 
visory Group on Electron Parts (AGEP) was 


960 


yrmed, with the objective of assisting “. . . in 
chieving a sound, coordinated, and integrated re- 
earch and development program in the field of 
lectronic parts.” The field of interest of AGEP 
as defined as including “. . . research and de- 
elopment of electronic parts including capaci- 
rs; coils, inductors, and transformers; electric 
nd magnetic properties of materials; electro- 
1echanical devices; frequency control devices; 
esistors; transmission lines; and techniques for 
ackaged subassemblies utilizing miniature elec- 
“onic parts and printed circuits.” In order to 
each this objective, AGEP was to *. . . continu- 
usly observe research and development activities 
1 the field of electronic parts, both within and 
ithout the Department of Defense... .” 

During the period following the formation of 
GREE, AGET, and AGEP, the question of unre- 
ability in military equipment became a legisla- 
ve issue. In July, 1954, the House of Represen- 
tives’ Committee on Government Operations had 
efore it for study a report by its Subcommittee 
n Military Operations concerning the develop- 
ent and procurement of AN/ARC-21 airborne 
adio transceivers.3 The Subcommittee noted 
wat the stress of the Korean War was a major 
ctor in the need for the AN/ARC-21, but felt 
iat the unreliability of the equipment should have 
een resolved before the Air Force ordered pro- 
uction equipment. As they stated, “The Subcom- 
\ittee is of the opinion that it is not enough to de- 
ign radio equipment which meets complex per- 
yrmance requirements. To be of real value, the 
jJuipment must be economical in initial cost, op- 
rating cost, and maintenance costs and in provid- 
ig the flexibility necessary to meet changing op- 
rational demands.” 

In March, 1955, the same Subcommittee re- 
iewed the merits of TAGAN and VOR-DME air 
avigation equipment.4 Once again, the reliability 
‘the equipments was discussed and again the 
ibcommittee noted that the reliability of a piece 
f gear should be determined before production of 
le gear is undertaken. 

By late 1955, AGREE felt *. . . that sufficient 
10wledge was available and sufficient interest 


Jommittee on Government Operations, Subcommittee 
1 Military Operations, 83rd Congress, 2nd Session, 
\ir Force Procurement of Airborne Radio Trans- 
sivers,” House Rept. No. 2578, Washington, D.C.; 

54. 

*ommittee on Government Operations, Subcommittee 
1 Military Operations, 84th Congress, 1st Session, 
Military Procurement of Air Navigation Equipment, 1 
\d 8 March 1955,” House of Representatives Hearings, 
ashington, D.C.; 1955. 


SPIEGEL AND BENNETT: MILITARY SYSTEM RELIABILITY 4) 


aroused that specific steps could be taken toward 
quantifying reliability requirements and toward 
developing suitable tests to verify that such re- 
quirements are met. Consequently, a program of 
nine tasks in the areas of numerical reliability re- 
quirements, tests, design procedures, components, 
procurement, packaging and transportation, stor- 
age and operation and maintenance was established. 
A task group of members from the Military De- 
partments and industry was assigned to each of the 
tasks early in 1956.” 

As a result of its investigations, a report of 
major magnitude was issued on June 4, 1957 by 
AGREE entitled “Reliability of Military Electronic 
Equipment.” The core findings of this report serve 
as a technical basis for the current approach to 
military system reliability and are as follows: 

Task Group 1 developed minimum acceptability 
figures for various electronic equipments. These 
figures, in “mean time between failures” (MTBF), 
were derived in liaison with operational commands 
and represented first steps toward a compilation 
of such calculations. 

Task Group 1 also recommended: “As time 
and effort for additional study become available, 
these figures be modified with regard to test en- 
vironment, state-of-the-art, compromise with 
other performance features, cost, maintenance 
load and availability. Rather than make no speci- 
fications in view of the long delays that these addi- 
tional studies may require, such modifications be 
made by considered opinions. Special studies be 
made to establish reliability requirements for the 
major air defense data-handling systems such as 
SAGE, Naval Tactical Data System, and MISSILE- 
MASTER, and for missile-borne electronics equip- 
ment.” 5 

In an Appendix to their report, they provided 
mathematical bases for their work as well as a 
cost model for optimizing reliability. 

Task Group 2 established a test procedure for 
design equipments which they felt “. . . balances 
economy of time and facility against the rigors of 
high accuracy and risk of wrong decision.” On the 
basis of their studies, the Group recommended 
that, in addition to the proposed testing, the reli- 
ability predictions prepared by the contractors 
should be carefully reviewed and that a review 
also be made of the contractor’s efforts in com- 
ponent testing and failure follow-up. The failure- 
rate test, they believed, should not be the sole 


5 advisory Group on Reliability of Electronic Equipment, 


Office of the Assistant Secretary of Defense (Research 
and Engineering), “Reliability of Military Electronic 
Equipment,” Washington, D.C.; June 4, 1957. 


6 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


basis of decision, because: first, the time and 
numbers available for test are usually quite limi- 
ted, thus placing a very broad confidence limit on 
the results; second, developmental models are 
rarely representative of future production; and 
third, the failure pattern of the developmental 
models is rarely representative of all failures. 

To obviate these points, the group provided details 
for a careful review of reliability prediction based 
upon review of paper design and the contractor’s 
programs for component test-to-failure. 

Future analysis, they felt, would be made of all 
test failures in order that adequate corrective ac- 
tion be initiated. The contractor’s total reliability 
effort, they go on, should “. . . be supervised by 
an independent evaluation group that is not subject 
to the interests of prejudices of project personnel 
on the staffs of the contractors or procurement 
agency.” 

Task Group 3 established specific routines for 
“ .. reliability index (mean life) evaluation of 
pilot-production equipment. . . reliability index 
evaluation of production equipment, and... lon- 
gevity ... evaluation of .. . production equip- 
ment.” These routines permit, according to Task 
Group 3 “. . . the establishment of the equipment’s 
capability of meeting a minimum reliability re- 
quirement. . . statistically conclusive proof that 
an acceptable percentage of quantity-produced 
equipment meets a minimum reliability require- 
ment...and... conclusive proof that equipment 
reliability does not degrade below a prescribed 
minimum level during the desired life of the 
equipment.” 

The selected testing method theoretically can- 
not be affected by the contractor or by prejudiced 
testing personnel. Techniques are established in 
which the testing methods are relatively self- 
checking and immune to errors in data recording. 

Task Group 4 considered developmental pro- 
cedures for any equipment so that the equipment 
would have the required inherent reliability. The 
proposed developmental program was divided by 
the Task Group into two phases: a feasibility 
study which would include a theoretical reliability 
prediction and which is terminated by the con- 
tractor’s report of this prediction, and the design 
and construction of prototype models. 

Task Group 5 reported that the then current 
military component specifications did not assure 
achievement of required reliability levels. The 
Qualification Approval lists, they felt, provided no 
assurance of any determinable failure rate for 
component reliability and the present military in- 
spection practices did not police reliability levels 
or yield data for reliability assurance. Group 5, 


December 


therefore, established a test procedure for “. . . dé 
termining the reliability of component parts and 
tubes... in terms of failure rate... or in specia 
parametric terms that apply to equipment relia- 
bility.” They further recommended that a perma- 
nent group be established at Department of Defense 
level to include representatives of industry and the 
three services and to be charged with the tasks of 
developing military component specifications, of 
testing component parts for design capability, and 
of developing inspection methods. 

Task Group 6, in general, found that the present 
procurement and contracting practices and regula- 
tions were inadequate to insure the high-reliability 
objectives as noted by Task Group 1. ‘They rec- 
ommended that comprehensive sets of technical 
specifications be established in order to produce 
the degrees of reliability required. They recom- 
mended that procurement agencies use the findings 
of Task Groups 1 through 5 in the form of specifica 
tions of equipment in order to assure the procuring 
of reliable equipments. 

Task Group 7 recommended, as a result of their 
investigations, that the equipment designer and the 
package designer meet early and often to determine 
the best packaging possible. Feedback from 
studies on shock and vibration during handling and 
transportation should become more effective than 
at present. The military should enforce all of the 
requirements for bracing and blocking. They 
further recommended that specifications be written 
to cover test procedures for the simulation of 


transportation and handling environment. 
Essentially, Task Group 8 found that failures ~ 
caused by storage are not significant when com- _ 
pared with failures caused by other things. How-— 
ever, they believe that this conclusion is based — 
upon inadequate records and, therefore, recom- 
mended that more careful records be kept in order 
to determine the cause of failure of equipment. } 
Task Group 9, in reviewing the methods and _ 
procedures for maintaining the reliability of equi 
ment in service, made a careful study of equipme! 
maintainability, performance checking, disposable 
modular units, test equipment calibration, as well 
as maintenance publications, manuals and hand- 
books. Included also in its studies were preventiv 
maintenance and marginal checking, the shortage 
of technicians and the general education of engi- — 
neers. A large number of recommendations were 
made. They felt that all contracts to be awarded 
by the Department of Defense should include a 
quantitative maintainability requirement and that 
the contractor should be required to demonstrate 
by test that his equipment has, in fact, met this 
particular maintainability requirement. 


960 SPIEGEL AND BENNETT: MILITARY SYSTEM RELIABILITY 7 


\verage-skill military technicians should be 
vailable to perform maintenance during these 
quipment evaluation tests. Maximum training 
ime should be limited to approximately a third of 
ny serviceman’s remaining enlistment. Preven- 
ive maintenance should be limited to only those 
omponents and parts which obey a wear-out law 
f failure. Marginal checking should be used 
yhenever possible. Calibration centers for test 
quipment should be established at various loca- 
ions to supplement existing facilities; standards 
gainst which these equipments are calibrated 
should be regularly compared with those available 
it the National Bureau of Standards. 

Subsequent to the publication of the report, it 
vas realized that efforts would have to be made 
o complete the work started by Task Group 5. 
\ccordingly, in February, 1958, action was taken 
o begin such an Ad Hoc effort. On July 14, 1958, 
in agreement was reached between the Director 
f Production Policy, OASD (Supply and Logistics) 
nd the Director of Electronics, OASD (Research 
nd Engineering) ,® for establishing the Ad Hoc 
study Group on Parts Specification Management 
or Reliability. The basic objective of the Ad Hoc 
itudy Group was to *. . . analyze the recommen- 
lations established by the AGREE Task Group 5 
n order to advise the Assistant Secretaries of 
Yefense (Research and Engineering) and (Supply 
nd Logistics) regarding efficient implementation 
nethods and procedures.” 

_ Expanding upon this objective, the Group con- 
idered specification preparation, requirements, 
upport from industry and documentation, along 

vith questions of Qualified Product Lists and the 
eed for a management organization for military 
art aeons at the Department of Defense 
evel. 

During the Sixth National Symposium on Relia- 
lity and Quality Control in Electronics, Janu- 
ry 11-13, 1960, some of the conclusions reached 
y the Ad Hoc Study Group were published.8 
"hree prototype specifications were presented 
yhich include some of the new features the Ad 
loc Group proposes. They provide *.. . four or 
ive reliability levels” as well as“... life test 
ampling plans” for various components. The 


EJ . Nucci, “Progress report on ad hoc study on parts 


cifications management for reliability,” 1959 IRE 
ATIONAL CONVENTION RECORD, pt. 6, pp. 120-129. 
R. Soward, “Status Report on Department of Defense 
d Hoc Study Group on Parts Specifications Management 
yr Reliability,” September 18, 1959. 

R.E. Moe, “Improved component and tube specifica- 
ons,” Proc. Sixth Natl. Symp. on Reliability and Quali- 
Control, Washington, D.C., pp. 1-11; January 11-13, 

560. 


Group noted that qualification approval procedures 
should be modified such that approvals will be 
granted for one of the reliability levels. If the 
component is improved and does reach a higher 
reliability level, it then will be raised on the list. 
Requalifications should be required every twelve 
months. 

Three months prior to the publication of the 
AGREE report, in March, 1957, the Ad Hoc Com- 
mittee for Guided Missile Reliability (ACGMR) 
was formed under the Assistant Secretary of De- 
fense, Research and Engineering, and on Novem- 
ber 15, 1957, was transferred to the Office of the 
Director of Guided Missiles. The ACGMR was to 
design a “.. . uniform monitoring program and 
management procedure that can be effectively 
used for all types of guided missile projects.” 

In April, 1958, the ACGMR published their re- 
port. The management and monitoring program 
ACGMR devised starts when the contract is 
awarded and continues through all phases of de- 
sign, development, production and major product 
improvement. To insure compliance with relia- 
bility specifications and to aid the contractor in 
knowing whether these goals are being reached, 
eight test points are established, at which time a 
fully documented report from the contractor, con- 
cerning either the predicted or the verified relia- 
bility, is required. 

The first monitoring point, Detail Design Study, 
starts “with the contract award and ends with a 
design report that includes studies of system and 
subsystem reliability that encompass the entire 
weapon-system design and includes an assessment 
of reliability, using prediction techniques wher- 
ever feasible.” 

The second monitoring point, Preprototype, oc- 
curs when“... the initial system design is nearly 
complete and many component parts and assem- 
blies have undergone some developmental testing. 
This point may be identified by some such phrase 
as ‘95 per cent of engineering released,’ ‘design 
engineering inspection’ or ‘the time at which 
initial design is essentially complete.’ ” 

The third monitoring point, Prototype, occurs 
when the first complete sets of hardware or sub- 
system hardware are available and“. . . can be 
assembled into the general physical configuration 
which they will have when used by the Military 
Services. Laboratory testing has been conducted 
to demonstrate the compatibility of weapon system 
and subsystems. Special test-vehicle flights to 
obtain data for design improvement are performed. 
During this phase, all necessary research and en- 
gineering data are obtained and the basic design 
firmly established.” 


8 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


The fourth reliability monitoring point, Pre- 
production Demonstration, occurs when “. . . the 
production design of the weapon system is essen- 
tially complete and the missile or missile system 
is considered ready for production. A demonstra- 
tion of the reliability achieved during this stage 
provides one of the bases for assessing the sys- 
tem’s readiness for full-scale production.” 

The fifth monitoring point, Demonstration of 
Service Readiness, causes the Contractor to 
« , . show that the weapon system which is usual- 
ly built under the limited- or pilot-production 
program has reached the reliability objectives— 
that the system can be produced in quantity with- 
out significant loss in performance or reliability.” 

The sixth monitoring point, Service Evaluation, 
is performed by military personnel. As stated, 
“... the military service uses its own personnel 
to perform its own weapon-system evaluation 
tests. If the weapon system is found to be opera- 
tionally acceptable and is capable of being pro- 
duced in quantity without significant loss in per- 
formance or reliability, approval of production 
for service use is usually given at this monitoring 
point.” 

The seventh monitoring point, Full-Scale Pro- 
duction, insures that “. . . the level of reliability 
deaioned into the posted is maintained during 
production.” 

The eighth and last reliability monitoring point, 
Demonstration of Major Product Improvement, 
occurs when “. . . the reliability and over-all 
value of major product improvements are demon- 
strated and may be approved for incorporation 
into the weapons system.” 

These eight reliability monitoring points are 
based upon several conclusions of the Committee. 
First, the Committee believes that “. . . reliabili- 


December 


ty is a parameter that can be predicted, assessed, 
measured and controlled during the design, devel- 
opment, production and major product improve- 
ment phases of guided-missile weapon systems.” 
Second, they believe that “. . . it is technically 
feasible and sound to specify and monitor reliabili 
ty in guided-missile weapon systems during their 
growth cycle.” 

This ACGMR report had immediate conse- 
quences, since it provided the military services 
with a document detailing the management pro- 
cedure by which they could implement these tech- 
nical recommendations of such committees as 
AGREE, AGEP and AGET to assure required re- 
liability in complex military systems. 


Epilog 


If a view of the future lies in the past, then 
certain generalities appear within the realm of 
tomorrow. Military systems will depend in- 
creasingly upon electronics;-while electronic sys- 
tem failures, running second only to human factor 
failures, may contribute disproportionately to the 
uncertainty of future military efforts. It is more 
than likely that the Department of Defense will 
further its concern with the scientific research 
and development necessary to raise military sys- 
tem reliability. And, as a corollary, such atten- 
tion will continue to focus upon electronics, so 
long as it appears to account for an excessive 
portion of total system failure. 

The Military question will no longer be “Will 
the system do the job?” Now the challenging 
question will be “How confident can we be that 
the system will do the job when and for as long as” 
it is needed?” 


1960 


IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL ) 


: Breaking Even on Failure Rate Reduction 


J. B. HEYNEt, SENIOR MEMBER, IRE 


Repeated studies of the time incidence of com- 
ponent failures in complex machines indicate an 
apparent exponential relationship between the 
probability of failure-free operation, P,, and the 
time at which failures are expected to occur, t. 
When these are plotted against each other, it is to 
be expected that the slope of the curve at t = 0 
is -r: 


pee ert (1) 


where A is the mean failure rate. 

This relationship is predicated on several as- 
sumptions. The first of these is that all failures 
are random and occur in accordance with a Pois- 
son distribution. All other contributory causes of 
failure are presumed to have been eliminated by 
design review or by the time-honored engineering 
practice of “debugging.” Because the empirical 
data which support this exponential relationship 
were derived under certain conditions, it is nec- 
essary that these same conditions apply when the 
exponential relationship is used for the prediction 
of machine expectation. These conditions are 
1) that the machine be subjected to continuous 
and/or repeated operation, 2) that the machine 
be subjected to continuous and/or repeated serv- 
ice, and 3) that the environmental stress be sim- 
ilar to that under which the basic data were 
pathered. 

In determining the probability of a machine 
being able to fulfill its operational requirement, 
it is exceedingly important to point out that the 
occurrence of a single component failure is rare- 
ly enough cause for complete machine breakdown. 
Moreover, depending on the particular structure 
of the machine and on the location of failed com- 
oonents, more than one component failure can 
occur without causing machine breakdown. This 
characteristic lends itself to mathematical de- 
velopment if the following quantities are defined: 


Py = the probability of failure-free operation 
of a complete machine 

the probability of failure-free operation 
of the ith function 


Pi 


‘System Dev. Corp., Santa Monica, Calif. 


Q; = the probability of failed operation of the 
ith function (= 1 - Pj) 

R; = the effectiveness of the machine under 
the conditions that the ith function has 
failed and all other functions are operable 

Ro = the effectiveness of the machine under 
failure-free conditions; for most purposes 
this quantity can be considered to be unity. 


It follows then that the total capability of the ma- 
chine, viewed in the light of operating in the face 
of partial failure is 


Qi 
Bo = Pore +> Po P, Ri (2) 


where i can be taken to describe a particular ma- 
chine function. The P; factors may be determined 
from the components merely by adding the respec- 
tive mean failure rates of those components which 
lie in each functional path and applying the sum in 
the exponential formula. Rigorous mathematical 
representation requires that the probability of 
simultaneous failure be included. This probability 
tends to be so low as to become insignificant with- 
in the accuracy of presently measurable mean 
failure rates. 

The total capability of an operator-controlled 
machine can be measured as a function of the in- 
formation presented to the operator. If it is as- 
sumed that complete presentation of information 
defines maximum capability, it becomes possible 
to quantify the relative loss of capability which 
results when various bits of information cannot be 
presented. A method for establishing system 
capability on the basis of audio-visual displays 
and controls to the operator can be inferred here. 
The execution of such a calculation requires de- 
tailed analyses such as can best be performed by 
those close to the design of the machine. 

The expectation, E, of satisfactory machine 
performance when required can then be deter- 
mined as the product of three probabilities: the 
probability that a machine is neither awaiting nor 
undergoing service; the probability that a machine 
which is neither awaiting nor undergoing service 
is truly ready; and the probability that a machine 
which is truly ready will be able to fulfill its 


10 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL December 

operational requirement. omy T= T+ (at - 2) = (9) 
Another measure of expectation is that it is 

equal to the ratio of the number of machines re- 

quired to fulfill an operational requirement to the so that 

number of machines which must be initially pro- (T, - T") make 

cured in anticipation of the operational require- t : : a r es (10) 

ment: AT E27) a TSanON 


w 1 
E = Po PaPe (3) 
where 
a f Tye ap (4) 
OL ap , 
t 
P, is the assessibility 
(ae 
-\t cmt 
P,=P,+A=e *D5Po p, Ri: (5) 
Let 
Qi 
A =) Po P, Re (6) 


Let N, = EN where Nj, is the number of ma- 
chines necessary to meet an operational require- 
ment and N is the number of machines procured 
in anticipation of the operational requirement. 
Further, let Ny, = E'N' where E" and N' have the 
same definition as before but that their values 
have been modified by changing one of the factors 
which contribute to down time, in this case A, the 
mean rate at which replacements become neces- 
sary: 


Ny, = EN = E'Nt (7) 


PhyPi Pi (T- TYPL(Ph +4) 


(T, - T)(Pt, + A") 
~ (T, - TP5 +4) (8) 


It is assumed that P, = P, when only the mean ~ 
replacement failure rate is changed: 


lwhere P, is defined as the ratio of the number of 
components energized in self-test to the total number 
of components, this equation becomes 


% Qi 
pipetP oe PaPod Por(icib.)> ob PB, Rj 
j 
when the jth machine function contains components in its 
path which are not energized in self-test. In this case, 
the ith machine functions are presumed to contain only 
those components which are energized in self-test. 


where X' - 2 is the change effected in the mean 
replacement failure rate: 


e a /a\ 


The change in the necessary number of initial ma- 
chines which owes itself to the change from A to 
A’ may be written as 


N ah teint 
Neads zac 


e +A 
F BAK oy) (4 )|- {sam 


where T and af are developed as follows. 


Machines are not on-line during the period they 
are awaiting or undergoing service. Service may 
be scheduled, occurring at regular time intervals, 
or it may be unscheduled necessitated by the ran- 
dom occurrence of machine failure: 


Tues peel Gg (13) 


where 


T = mean total down time expected during a 
given time period T; 

Ty = mean total time in scheduled service ex-— 
pected during T; 

Ty = mean total time in unscheduled service 
expected during Tt. 

The mean total time in scheduled service during 

T; is the sum of the following five parts: 


a 

‘ 
1) The mean number of scheduled servicings, 
Np, multiplied by the mean total time necessary — 
to unbutton the machine, hook up support equip- 
ment used in scheduled service, unhook support 
equipment and rebutton the machine. Part 1), of 
scheduled service may be written as: 


v 
Nota = M ta (14) 


y = the mean total number of operational 
hours expected on the system during ay 

M = the mean total number of operational 
hours allowed between scheduled serv- 
icings 

t, = the mean total time required to unbutton, 
hook up, unhook and rebutton the machine 
at each servicing. 


2) The mean number of scheduled servicings 
nultiplied by the mean time required to perform 
ervice on the machine, provided that no adjust- 
nents or replacements are necessary. Part 2) of 
cheduled service may be written as: 


vy 


Nob =a th (15) 


yhere 


th = the mean total time required to perform 
service on a machine which has been 
rendered accessible, provided that no ad- 
justments or replacements are necessary. 


3) The mean number of replacements found 
ecessary to the machine at the beginning of 
cheduled service, Np, multiplied by the mean 
otal time necessary to make each replacement. 
“his is a function of the probability of detecting a 
ailure during scheduled service called “thorough- 
ess,” and of the probability that a given replace- 
nent will not require further adjustment called 
interchangeability,” and of the probability of 
aving detected a failure other than during 
cheduled service called “assessibility.” Thus, 


Np(1 - P,) Piety, 
rhere 


Np = the mean total number of replacements 
necessary at the beginning of scheduled 
service 

P._ = the probability of failure detection at 

times other than scheduled service 

called assessibility 

P,, = the probability of failure detection dur- 
ing scheduled service of time duration 
tp, this may be called thoroughness 

Tp = the mean total time required for a re- 
placement to be made. 


‘his may also be written as 


a 


HEYNE: BREAKING EVEN ON FAILURE RATE REDUCTION il 


Np(1 - 1B PpPalp + Np(1 - PP.) - Py) P.(T, + Tp) 
where 


Py = the probability that a given replacement 
does not require further adjustment 
called interchangeability 

T, = the mean total time required for an ad- 
justment to be made 

Ap = defined as the mean rate at which replace- 
ments are necessary to the machine. 


Part 3) of scheduled service may be written as: 
MAp(1 - Pa) PphPeTy 


= MA,(1 - Patt - Pi) Rare + Tp). (16) 

4) The mean number of adjustments found nec- 
essary at the beginning of scheduled service, N,, 
multiplied by the mean total time required for an 
adjustment to be made. Previous scheduled serv- 
ice is assumed to have insured that all necessary 
adjustments were made. Pz, is the probability that 
a given adjustment will obtain during M opera- 
tional hours called “adjustability.” Part 4) of 
scheduled service may be written as: 


where 


Ag = the mean rate at which adjustments are 
necessary to the machine. Where 
Ag < 1/M, Py = 1. Pg is less than 


unity whenever Ag > 1/M. 


5) The mean number of replacements necessi- 
tated by the fact that the machine is being operated 
during the scheduled service, multiplied by the 
mean total time required for each replacement. 
The environmental stress on the machine during 
scheduled service is such that it can be assumed 
that no failure resulting from loss of adjustment 
will occur once the adjustment has been made, and 
that any failure which occurs will be of the type 
which requires replacement. Further, it shall be 
assumed that the nature of scheduled service af- 
fords such minimum environmental stress on the 
machine and is of such relatively short time dura- 
tion, t,, that no failures will occur. Part 5) of 
scheduled service may be written as: 


SNoPpP eT + sN,(1 - Py) Pavia + Ty (18) 


where 


12 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


s = the mean ratio of environmental stress 
on the machine during service to the en- 
vironmental stress on the machine dur- 
ing operation. 

N, = the mean number of replacements nec- 

essitated during that part of scheduled 

service where environmental conditions 
might be expected to cause failures. 


09 


The mean total time in unscheduled service 
during T; is made up of the following three parts: 


1) The mean total number of adjustments found 
to be necessary during the time period between 
scheduled services, multiplied by a time which is 
the sum of: a) the mean total time required to 
make an adjustment; and b) the mean time, tj, 
which is required to detect a failure which has oc- 
curred between regularly scheduled services; plus 
c) t,, basic setup and set-down times. Part 1) of 
unscheduled service may be written as: 


N,(1 - eG +t, + te). 


2) The mean total number of replacements 
found to be necessary during machine operation 
between scheduled services, multiplied by the sum 
of mean total times to setup, detect failures, ad- 
just and replace. This mean number of replace- 
ments is a function of the assessibility, P,, and 
the interchangeability, P}). Part 2) of unscheduled 
service may be written as: 


(19) 


NpPs Pps + t, ar te) 


+ NpP,(1 - Pp)(T, + Th + ta + te)- (20) 

3) The mean total number of adjustments and 
replacements which were actually necessary at 
the time of scheduled service but which were 
missed because the thoroughness, P,, is less 
than unity. Part 3) of unscheduled service may be 
written as: 


ieaofipilhi, = eet teal (21) 


By combining the preceding equations an expres- 
sion for the total down time, T, is derived 


aR es a es + tp) + (22) 


2 2 
aj TaTptagTp+agTaTp+aqTeTpta5Tp+agT},+a7T 9 
Tp = ApSP TH = Aps(1 = PhP ae.) 


December 


Eq. (22) enables the evaluation of the expected ef- 
fect on the mean total down time which variation 
of any of the contributing down-time parameters 
might cause. If A is the down-time parameter 
whose variation is under study, 


2 3 ne 3 


Or we 


4 5 4 2mS 
+ b5Tp + bgTp + b7TaTp + bgTaTp 


@e 


Sear 2 2 


(23) 
we 
where, for convenience, 
wW = Tpl1 = \psP sty, = Aps\Lalp Betas 
=.Tp= Bis J ¢rTe (24) 
B = ApsPe. (25) 


Algebraic expressions for these constants have 
been derived and are available to those request- 
ing them. They have been omitted here for con- 
venience. 

The insertion of nominal or measured quanti- 
ties for the \’s, P’s, T’s, M’s and s establishes 
— . Once these are es- 
tablished, it is possible to measure the value of 
any planned or expected change in ) in terms of 
a reduction in the number of initially procured 
machines necessary to the fulfillment of an 
operational requirement. Where the cost of re- 
ducing the mean replacement failure rate, X, is 
less than the expected savings resulting from 
the procurement of fewer initial systems, this 
means of improving expectation should be ex- 
plored. This is true for all of the down-time 
parameters and provides a basis for deciding 
which down-time parameter will yield the 
greatest improvement in over-all expectation 
for a given investment of resources. Calcula- 
tions of expected savings in terms of machines — 
should include the cost of supporting such ad- 
ditional machines as well as the cost of their 
initial procurement. 


values for T and for 


L960 


IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 13 


Optimal Diagnostic Procedures* 


B. B. WINTERTt 


Summary.—In recent papers, !,2 optimal diag- 
10stic procedures are presented for some special 
cases. In this paper, we present an optimal diag- 
10Sstic procedure under a different restriction, 
|.€., we consider equipment in which elements can 
Inly be tested one at atime, or all at once. Opti- 
nality is in the sense of minimum expected cost. 


I. MODEL AND TERMINOLOGY 


Consider an equipment which consists of N, 

N > 1, elements (as defined by Brule, et al.4). 
[he elements fail independently. The failure of 
yne element does not cause the equipment to cease 
‘unctioning (though it is now functioning “errone- 
Jusly”) and, thus, can be followed by failures of 
yther elements at subsequent times. An over-all 
est can be applied to the equipment, such that it 
yasses if and only if all elements are good. At 
some time, following the failure of at least one 
element, the equipment is subjected to diagnosis 
n the following manner. 

With the equipment known to be bad (i.e., one 
xr more elements are bad), we test the elements 
yne at atime. Whenever we encounter a bad ele- 
nent, we replace (or repair) it and then apply the 
ver-all test to determine whether we should con- 
inue with the testing of individual elements. If, 
subsequent to the replacement of some element k, 
he equipment fails on the over-all test and ele- 
nents k+1,k+2,...,N- 1, are all found good, 
hen the element N is known to be bad “by elimi- 
ation” and need not be actually tested. Further- 
nore, if the element N is thus found to be bad, 
he over-all test need not be performed after re- 
lacement of that element. The last assertion 


‘This work was performed in connection with U.S. Army 
ignal Corps Contract DA 36-039-SC-75084. 
Aeronutronic, a Div. of the Ford Motor Co., Newport 
3each, Calif. 

J.D. Brulé, R.A. Johnson, and E.J. Kletsky, “Diag- 
osis of equipment failures,” IRE TRANS. ON RELIA- 
SILITY AND QUALITY CONTROL, vol, RQC-9, 

p. 23-34; April, 1960. 

R.A. Johnson, “An information theory approach to 
iagnosis,” Proc. Sixth Natl. Symp. on Reliability 

nd Quality Control, Washington, D.C.; January 11-13, 
960. 


arises from the implicit assumption that no ele- 
ment can fail during diagnosis. 

Since the equipment continues functioning after 
the failure of one or more elements, we can only 
detect equipment failure by periodic examinations 
of the equipment (such examinations can, of course, 
be performed by application of the over-all test). 
When the equipment is found bad by such an ex- 
amination, we then engage in the diagnostic pro- 
cedure described above. 

Let p,(t) be the a priori probability that ele- 
ment i is bad, if the equipment passed examina- 
tion at time t, and is about to be examined at 
time t; then 


t 
J Fyi(t)dt/[1-Fy(to)] 


42) 
ats 

a= 

cs 
M 


= [F\()- F(t.) /[1-Fi(t))), 


where Fj is the failure distribution function for 
element i. If element i is a replacement element, 
present in the equipment only since some time yj, 
then 


p,(t) = [Fy(t-y,)-Fy(to-yj) /[1- Fy(to-y;)]- 


Note that p, (t) is a conditional probability, con- 
ditional on the event of passing at time t,, but is 
a priori with respect to the state of affairs at t 
since it refers to the probability prior to any 
knowledge as to whether the equipment is bad at t. 
In the sequel, we write D; for p, (t) and q; for 
1-p,(t). 

With each test or repair we associate a fixed 
“cost,” e.g., length of time to perform the test or 
repair in question. The quantity which is taken to 
be the cost of an operation must have the proper- 
ties: 


1) the cost associated with an operation takes 
on the value zero if the operation is not per- 
formed, and takes on some fixed value if the 
operation is performed. 

2) the costs are linear; the cost associated with 
a complete testing and repairing sequence is 
the sum of the costs incurred in connection 
with the individual operations. 


14 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL December 


The following notation is used, withi =1, 2, 


5 INE 
£(S),) expected cost of the testing se- 
quence Sx; 
AD cost of over-all test; 
a cost of testing ith element; 
p; cost of repairing (or replacing) ith 
element; 
Dj a priori probability that element i 
is bad; 
q;=1-p; a priori probability that element i 
is good; 
N 
Q;= 7 qj a priori probability that elements 
i=! i, i+1,)..... N are. all good. 


To avoid triviality, we require 0<p;<1, all i. 
As a convenience, we write 


Po = To =Pp =O and gq, =1 


for a fictitious zeroth element. 

The numbering of the elements is fundamental- 
ly arbitrary. As a convenience, let us require 
that the elements be numbered in such a manner 
that 


Le Gel I< aN 
implies e B ; (1) 
i j 
By dias a 
p, ti P; CF 


ll. EXPECTED COST 


Let Sy denote the testing sequence in which 
the elements are tested in the order in which they 
are numbered, i. i.e., element No. 1 is tested first, 


etc. (This ee quence is in fact implied in the odel 


description in Section I.) 

Let us introduce the set of stochastic variables 
{x, | izl, 2,..., N} to represent the cost actual- 
ly incurred in connection with each of the ele- 
ments, i.e., testing it and, if necessary, repairing 
or replacing it. Then, ice ds Seba Ni} 


0 if the testing sequence termi- 
nated before element i; 


T; if the testing sequence did not 
terminate before element i, 
and that element is good; 


T,+p;,+T_ if the testing sequence did not 
terminate before element i, 
and that element is bad; 


and 


0) if the testing sequence terminated 
before element N; 


Px if the testing sequence did not termi- 
nate before element N. 

Now, we seek to establish the probability with 
which Bae x; takes on each of its possible values, 
Since the plement: fail independently, the condi- 
tion that the equipment was good at t, does not af- 
fect the multiplication rule. However, the condi- 
tion that the equipment is bad at t eliminates from 
the event space the possibility of all elements 
being good at t. 

Let A denote the event “x; = 7;;” let B de- 
note the event “x, 4 0;” let E denote the event 
“the equipment is bad.” Note that A implies B 
which, in turn, implies E. Therefore 


Pr {A} = Pr{A and B} = Pr {A and B and E}. 
Pr {B} = Pr {Band E} 


Also note that, for any events K, L and M, 
Pr{K and L|M} = Pr{L|M}Pr{K|L and M} 


and 
Pr{K|L and M}=Pr{K and L and M}/Pr{L and M}. 


Therefore 
Pr{A|E} = Pr{A and B/E} = Pr{B| E}Pr{A| Band E} 


- Pr{B/E} Pr{A and B and E} 
Pr{B and E} 
\ Pr{A and B} J 
Pr{B} 4 


= Pr{B|E} Pr{A|B}, 


=Pr{BIE 


i.e., for 1 << i< N; 


brik sas Loacoat een qaaan 
1 
Similarly, 
1-Q; é 
Prix; =1.74 4 py + TIP fas a 
Das! wei 


and 


960 WINTER: OPTIMAL DIAGNOSTIC PROCEDURES 15 


1-Q; Qi-Q) Hl. OPTIMIZATION 
r \x.=0/ Es = 1-Prjx; £0 a Te es . 
{ i \ {x; £0/\E} Orie E50, 
Any testing sequence can be derived from the 
dn-@) sequence Sy by successive permutations of ad- 


x E = = = oo tae 
r| N™ =Py! 1s; ae anG Prixy 01E} 1-Q, ’ joining elements. To say that elements i and i+1 


are permuted with respect to Sy is to say that the 
fo summarize, we have, with the equipment known elements are to be tested in the order L525 ct bky 


o be bad, LjaanNe 
Consider the sequence S},, obtained from S 
Q;-Q a nite N 
0 with probability Lord by permuting the adjoining elements r and s, 
1-Q, s=r+1< N. We have 
44-Qi N N 
aT. mith probability. ——____ ) i=], 2 
a i p vi 1-@ Bee (1-01) 2 p;P, + T >» Di - Py(Ty + T) 
me Wes sit 
: aap Pi 
tot Digest T with probability -Q, nae 
+ 25 -7,(1-@,)'+ 7,(1-Q,) 
ind ie 
Gn -Q 
: Nat N 
0 with probability 50; fe 7 (1-4-95,1) +> T,(1-Q,) 
i=s+l 
N = 
1-dy N N 
ith rr 
Brame ater 2G, = 1 pip; are Py(Ty+T) Er 1-Q;) 
fel hell ie 


Now we have, for the expected cost, 
Tg(1-dr95Q541) + Tp(1-dpQ5,1) 


(Sx) = E( > X;) = om E(x;), a T(1-dp4gQg ,1) = T,(1-45Q,,)) 
=i ital 
° N-1 = (1-Q;) £(Syy) + Qs41 (aed lade) 
1-Q))£(Sy) = x [74(4,-Q)) + (74 +0, + TP; ] +PNPy - T,4,.(1-4,)] 
j= 
o 
N-1 N-1 = (1-Q)f(Sx) + PyPs Q541 cE ds 
Lop +T E w+ E 70-9) 
1 i=L T. 
a te (3) 
N N-1°- oN 3 
. 2 pypy+T au pit 74(1-Q))- Ty(1-ay) Now, since r < s and since the elements are 
i=l i=l irl numbered according to (1), 
N N re Ty 
= 2) pip, + T DD; pe Tait pe orse 
i=1 i=l . 
N also, 
A(T=One : 2 ; 
+ 2 T;(1 Q;) Pyl(Ty ot T) ( ) PrP 5541 SO and 1-Q, = 0; 


therefore (3) implies 


£(S\.) > £(Sy). 


16 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


Thus 1) no sequence in which two adjoining ele- 
ments i-1 and i, i < N fail to satisfy condition 
(1) can have a lower expected cost than the se- 
quence Sy; and 2) any sequence in which two ad- 
joining elements i-1 and i, i-< N do satisfy con- 
dition (1) has the same erpected cost as the se- 
quence Sy: 

Therefore, we have shown that the sequence 
S,, is optimal among all sequences in which ele- 
ment N is tested last. 

Let us examine the expected cost of testing the 
elements in the order 1,2,...,k-1,kK+1,...,N,k, a 
testing sequence which will be referred to as §,. 
Note that S, differs from Sy only in that ele- 
ment k is tested last. We have, by analogy with 


(2), 


N k-1 
(1-Q,)£( = 2 iP + T; (1-Q;) 
teal 


N N 
+ 7, (1-4,Q;) + TCL Pj-P,) 
La 


i=k+1 


Spiteri aa) +E Dj 
ell i=1 


N N 
aD T;(1-9),Q;) - > 7,0.-Q,)-Tp, 
i=k+1 i-k 
N N N 
= 2 wii + D 7(1-O) + UD ay 
i=l =e 
N 
+E TOI-4) ~ Tk(-44Q) - TP, 
N N N 
= Lop, t+ TD pit Do 741-9) 
Fal {21 it 


+P, 2 TQi-T - T,(1-q,Q,). (4) 
L= 


Since the first three sums in the last number 
of (4) are independent of k, f(S;,) is a minimum 
when 


December 
g(S,) = min{g(S,)} , 
1<j<N 
where N (5) 


g(S;) Pj 2, URIs j= 7j(1-4,Q)) . 


These findings may be summarized in the fol- 
lowing algorithm: Number all the elements so that 
condition (1) is satisfied. Test the elements in the 
sequence 1,2,...,k-1, k+1,...,N,k where k is de- 
termined by condition (5). The expected cost of 
this testing sequence is a minimum, and is given 


by (4). 


IV. SIMPLIFICATIONS AND 
EXTENSIONS 


A. Inclusion of Superfluous Tests 


Assume that we do not rely on testing the last 
element “by elimination,” i.e., we test it even in 
instances in which the last element’s failure can 
be inferred from outcomes of previous tests. 
Furthermore, let us perform the over-all test 
after all elements have either been found good or 
have been repaired (even though this test is now 
superfluous). As in the original model, we let x; 
be the cost actually incurred in connection with 


element i; we then have 
; Nee inet 
0 with probability f 
1-Q; ; 
q-O; | iie 
4 = ot Re : : C} 9 Dae 9 9 
Xi i with probability i= G Na 
; ne D; 
T, +p; +T with probability _1 
1-Q, 


and 


N N N ‘ 
(1-Qy)f(Sy) = 22 7;(1-Q;) + (T+ 27 9;) Dy. (6) 
i= i=l is % 


Define S\. as before, but with s =r+1<N 
(in the original model, strict inequality held); we 
find that 


(S\) > £(S,), 


thus showing that the sequence S,, is optimal 
among all possible testing sequences. This result 
yields the following algorithm: Number all the 
elements so that condition (1) is satisfied. Test 


960 


1e elements in the sequence 1,2,...,N and in ac- 
ordance with the above assumptions. The ex- 
ected cost of this testing sequence is a minimum, 
nd is given by (6). 


}. Single Element Failure 


Assume that one and only one element is bad 

hen the equipment is subjected to diagnosis. 
‘his assumption can be made if and only if 1) the 
quipment fails as soon as any one element fails; 
nd 2) no elements can fail after the first element 
azilure. 

It is conceivable that equipment satisfying as- 
umptions 1) and 2) can have the further property 
nat any equipment malfunction becomes immedi- 
tely apparent, thus precluding the necessity of 
eriodically examining the equipment for possible 
qalfunction. If that is the case, we have 


Dj = Fj (t) : 


If the equipment does have to be examined 
eriodically (due to the absence of automatic fail- 
re indication), then p. is as given in Section I. 

In this model, we let x be the total cost of lo- 
ating and repairing the single bad element. The 
onditional probability that element i is bad, 
iven that one and only one element is bad, is 


0 i! - Pi 
Pj q ra q = 
j4i J b: ij J £ qj 
Noiep ASD pe Np, 
aco 5 1 as anger BCR ai 
k=1 tay es Oe) ae 
‘hen 
Pi 


a 
x=p,+ Tj W.p. N 
I= ys “k 


ee 


(i=1,2,..-,N). 


As before, let Sy be the testing sequence in 
hich the elements are tested in the order in 
hich they are numbered; and let S\ be the test- 
1 sequence which differs from S,, in that two 
djoining elements r and s, s=r+1<N, are 
ermuted. We then have 


Pi 

See 
£(Syy) = E(x) = (ie aa 5 
Sy as 1 ie j N Py 


WINTER: OPTIMAL DIAGNOSTIC PROCEDURES IN 
N Dp. N Dj N (i 1 
>? GON os Pligete ae ) (7) 
k=] *k al heh Ob jee 
and 
N p Np, _N p.-4 
k p p 
ORS Spy prs + y= ae 
k=1 4k Gee eee en 
pr 8 s 
gy" Te) tq Ty - 7) 
4 eal tS) Jel 
Dames! Dames 
--=(¥7)4—(S 7) 
qa eg! j 
Sie ip eal! 
N p 
k p p 
=(D (Sy) + 7 - £7, 
k-1 %k dy s 


Since r < s, and the elements are numbered 
according to (1), 
Ta ar 
“ -— G2 n0; 
a om 


therefore 
{(Sy) > {(Sy), 


thus showing that the expected cost, given by (7), 
is a minimum when the elements are numbered 
according to (1) and are tested in the sequence 
| aR 

If we allow for testing “by elimination,” we 
have 


i N 
(E ZB) =D porch Tia ang 
jel) i=] ‘1 i=1s ) j=1 N 


Following the same development as before, we 
find that the optimal test sequence S, is 1,2,..., 
k-1,k+1,...,N,k, with the elements numbered ac- 
cording to (1) and k determined by (8): 


18 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


h(k) = min{h(j)}, where 
1<j<N 
om, (8) 
p; N N D; Dj 
Beet adj es aa 
qi j 


C. Multilevel Equipment 


Assume that the equipment is “ modularized” at 
several “levels,” i.e., that the equipment consists 
of N modules; that the mth module consists of iy, 
submodules, m=1,2,...,N; that each module in turn 
consists of some number of sub-sub modules, etc. 
If, at each level, testing can be performed in ac- 
cordance with our restrictions (either as stated in 
Section I, or as stated in Sections IV-A or IV-B, 
above) then we can use the above developed rules 
at each level, provided we adjust the “costs” ac- 
cordingly, as follows. 

When determining the sequence in which to test 
the modules, let T be the cost of an over-all test 
of the equipment, let 7; be the cost of testing the 
ith module as a whole, and let p, be the expected 
cost of testing and repairing the entire collection 
of submodules of the ith module. When deter- 
mining the sequence in which to test the sub- 
modules of the ith module, let T be the cost of an 
over-all test of that module, let 7; be the cost of 
testing the jth submodule of the ith module as a 
whole, and let p; be the expected cost of testing 
and repairing the sub-sub modules of the jth sub- 
module of the ith module, etc. 

That the previously developed rules can be thus 
applied is an immediate consequence of Bellman’s 
Principle of Optimality.3 


V. INDETERMINATE TESTS 
We have up to now considered tests whose only 


possible outcome is “pass” or “fail.” In fact, it 
is possible that a test have an indeterminate out- 


3R. Bellman, “Dynamic Programming,” Princeton Uni- 
versity Press, Princeton, N.J.; 1957. 


December 


come, neither pass nor fail, i.e., that the applica- 
tion of the test yields no information whatsoever. 
Let z; be the probability that testing the ith item 
will result in an indeterminate outcome, and let 

a given test be applied repeatedly until it results 
in an unambiguous outcome. If 6; is the total cost 
of testing the ith item under this regime, then 

a 


1-2; 


a idyll 
E(0;) = z (nT;)z, © (1-Z,) = 


All the previously derived results hold under 
this regime provided that 7; is replaced by 
T;/(1-z)). 


VI. ESTIMATION OF pj 
As stated in Section I, 
p,(t) = [Fy(t) - Fy(t.)]/ [1-F,(t,)] 
for original elements, and 


p,(t) = [F,(t-y;) = F,(to-y,)]/ [inital 

for replacement elements operating since the time 
yj- Thus the estimation of p, has as a prerequi- 
site the estimation of the underlying failure dis- 
tributions of the elements. 

The estimation of the p; is somewhat simpli- 
fied if one assumes that 1) the elements fail in a 
random fashion, i.e., their failure distribution is 
the exponential distribution; and 2) the interval 
between examinations is constant, say w time 
units (following a diagnosis, the next examination 
is performed w time units after the end of diag- 
nosis). f 


Then 
F(t) - F(t-w) 4, _o-t/A ie (t-w)/% | q 
* 1-F.(t-w) e 1-[1-e- Ww) ve j 


en (tw) Airy“ W/Ai] 


= se 
en t-wye 


p, (t) 


er W/ Ki : 


where A; is the mean life of the ith component. 

Thus, under assumptions 1) and 2), and if ); is 
known, p; (t) can be calculated once and for all. If 
A; is not known, one can estimate p; by 


960 WINTER: OPTIMAL DIAGNOSTIC PROCEDURES 19 


vhere n; is the total number of times that ele- 
nent i was found defective in all examinations, 
nd j is the total number of past examinations, 
ncluding those in which the equipment was found 
ood. Also, q; is estimated by 


In the “single failure” model, we indicated the 
possibility of not having to examine the equipment 
yeriodically. If that is the case, and if the ele- 
nents are assumed to have the exponential failure 
listribution, then 


h. = toa7% (t-to) = 
Pie=¥l-erct » G: = 1-pj 
vhere t, is the termination time of the last pre- 
rious diagnosis, t is the time at which diagnosis 
Ss about to be performed, and 


Fe Fi 


vith n; being the number of times element i was 
ound defective and L being the total operating 
ime of the equipment. 


Vil. AN EXAMPLE 

An equipment consists of five elements A,B,C, 
) and E. Each element is subject to random fail- 
ire, i.e., has an exponentially distributed life. 
[he equipment is put into operation at time zero, 
s examined and found good at 1 hour, is examined 


ind found bad at 2 hours. Then 
. F,(2) - F,(1) i y-e72/M_(y e741) 
Sede l-¥ onl! 
=T/ Aa Ga liAy 
whe (1-e Ne pee ti, 


a Aj 


vhere A; is the mean life of element i, in hours. 
Table I gives the known mean lifetimes of the 
ive elements, the p; calculated as above, and the 


testing costs Tie (in arbitrary units); also, the 
values of T 45/2; and the resultant ordering ac- 
cording to criterion (1). 


TABLE I 


Element 


If we follow the model of Section IV-a (i.e., in- 
clude superfluous tests on the end), the optimal 
sequence is thus found to be B,D,E,C,A. If we 
follow the general model, criterion (5) must be 
used to select one of the five elements to be trans- 
ferred to the end of the testing sequence. 


VIII. COMMENT 

A model similar to our general model has been 
treated by Johnson,4 but it can be shown that his 
expression for f(Syj) is too large by the factor 
(1-Q)). Johnson also treats the “single failure” 
model, but fails to indicate that he is treating that 
case in terms of conditional probabilities. Gluss® 
has treated the “single failure, no elimination” 
model, though in a manner different from ours. 


IX. ACKNOWLEDGMENT 


The author is indebted to Dr. R. E. Beckwith for 


his many helpful comments. 


45M. Johnson, “Optimal Sequential Testing,” RAND 
Corp., Santa Monica, Calif., Res. Memo. No. 1652; 1956. 
5B, Gluss, “An optimum policy for detecting a fault in a 
complex system,” Operations Res., vol. 7, no. 4; July- 
August, 1959. 


20 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


December 


New Autopsy Techniques for Transistors 


and Relays* 


C.B. CLARKt, MEMBER, IRE, AND E.F. DUFFEKt 


Summary—A new method for opening 
hermetically-sealed metal cases is described. In- 
stead of mechanically sawing or cutting the metal, 
an electrochemical process is used. Two methods 
are described, static electrolysis (anodic disso- 
lution) and jet electrolysis. Examples of the ap- 
plication of these methods to the “autopsy” of 
failed transistors with 6-mil Kovar shells and re- 
lays with 15-mil brass shells is shown. 


INTRODUCTION 


One of the jobs often done by a reliability group 
involves the examination of failed parts. In many 
cases a cursory visual examination is sufficient 
to determine if the component died a natural death 
or if its demise was violent. In the case of her- 
metically-sealed devices such as transistors and 
relays, however, external examination is not very 
helpful. It is possible to use X-ray methods to 
find out what happened inside the unit, but if the 
component is small, high-definition X ray is 
needed. This type of X-ray equipment is expen- 
sive and often unavailable. 

It is much more informative to actually open 
the case (autopsy) to see what caused the failure. 
Opening a transistor case mechanically, however, 
often disturbs the rather delicate emitter and col- 
lector junction wires, particularly if the case is 
filled with “moose gunk.”! The usual methods of 
cutting and grinding were tried, but pressure from 
the filling material disturbed the lead wires as 
the case was removed. Attempts were also made 
to dissolve the moose gunk away by admitting 
solvent through a hole in the transistor case, but 
this was not satisfactory. 


*This work was in part supported by AF Contract 
33(604)-17231 and under subcontract to Hoffman Mili- 
tary Products Div., Hoffman Electronics Corp., Los 
Angeles, Calif. 

tStanford Res. Inst., Menlo Park, Calif. 

lgee Airlines Electronic Engineering Committee 1960 
Letter No. 29 for origin of the term “moose gunk,” re- 
ferring to the mysterious mixture of silicone grease 
and other materials put inside the case by some transis- 
tor manufacturers. 


The “autopsy” technique we have developed 
uses an electrochemical process to remove the 
case, leaving the internal transistor and relay ele- 
ments intact. The method is very simple, involv- 
ing only a holder for the transistor or relay, a 
shaped cathode, acid solution and a small dc power 
supply. Two procedures were used, a (static) 
electrolysis method for transistors and a jet elec- 
trolysis method for relay cans. The following de- 
scription will indicate how these were carried out. 


TRANSISTOR “AUTOPSY” 


The failed transistors we were concerned with 
have a 170-mil OD case of about 6-mil-thick Kovar. 
The defective unit, held in a small clip, is posi- 
tioned within a 7/16-inch-diameter hole punched 
in the stainless steel cathode as shown in Fig. 1. 


yl 
Wi-\\, 


—e. 
eomrmwenrarel 


‘pew eoe 


Fig. 1.—Static electrolysis equipment. 


960 


. Small Alnico magnet is used to position the 
ransistor so that equal distances exist between 
lectrodes all around the transistor case. The 
ntire assembly is then immersed in a 20 per 

ent hydrocholoric acid solution at room tempera- 
ure. A current of about 1/4 ampere is applied 
rom a constant current supply, making the tran- 
istor case anodic with respect to the stainless 
teel holder. In 20 to 30 minutes the voltage 
cross the terminals, initially 4 volts, begins to 
‘ise, indicating that the 6-mil case has been dis- 
Olved. The process taking place is essentially 
nodic dissolution (etching) of the Kovar. Hydro- 
hloric acid was chosen on the basis that Kovar 
7ould dissolve rapidly only with the condition of 
pplied current. Thus, for example, a heavy gold 
late on a transistor may be more readily re- 
noved by anodic treatment in a sodium cyanide 
Olution. Solutions to dissolve other metals should 
ye considered similarly. 

The unit is washed after removing it from the 
ath, and examined under a microscope. The 
noose gunk filling is sometimes clear enough so 
hat the nature of the fault is discernible. If the 
iller material is not clear enough to see through, 


Fig. 2—Transistor with case removed, 
severe overload. 


CLARK AND DUFFEK: NEW AUTOPSY TECHNIQUES FOR TRANSISTORS AND RELAY 21 


it can be dissolved in a vapor degreaser using 
ethylene dichoride at 30° to 35° C. 

Two transistors opened by this technique and 
degreased are shown in Figs. 2 and 3. Fig. 2 
clearly represents a case of severe overload, 
since the emitter wire has been melted into a ball. 
The reason for the failure of the unit shown in 
Fig. 3 is not clear, but it is definitely not a case 
of very severe overload as in the previous unit. 
The emitter junction is fractured, either by 
mechanical shock or possibly by thermal shock 
from a mild overload. 


Fig. 3—Transistor with case removed, 
no severe overload. 


RELAY “AUTOPSY” 


One of the limitations of the static electrolysis 
method used on the transistors is nonuniform cur- 
rent distribution in the electrolyte. This is par- 
ticularly troublesome when the can shape is non- 
circular, as in the crystal can relays. 

To avoid difficulties with nonuniform current 
distribution, and allow higher etching currents, 


22 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL December 


MOTOR CONTROL 


Fig. 4—Jet electrolysis equipment. 


the equipment shown in Fig. 4 was used. 

The relay to be opened is clamped in a rotata- 
ble fixture as shown. The paint is scratched off 
the relay case where the cut was wanted. A 
stream or jet of 10 per cent HCI solution is di- 
rected at the metal relay case through a metal 
tubing, and a 24-volt supply connected between the 
metal fixture and the metal tubing, tubing negative, 
and relay case anodic. The motor speed control 
is adjusted to operate the motor at about 20 rpm. 
A current of about 1 ampere resulted in the case 
being cut in two in 20 to 30 minutes. About 1 liter 
of solution was made up and re-used several 
times by pouring it back into the upper container. 

A picture of a relay opened by the above pro- 
cedure is shown in Fig. 5. Note the clean, well- 
defined cut. The case is made of brass, and is 
about 15 mils thick. Although some of the dilute 
acid reached the interior of the relay, little dam- 
age was done. 

The jet electrolysis method should be useful on 
many items, since it is fast and adaptable to dif- 
ferent shapes. It would probably be useful for 
opening transistors as well, particularly if the 
case is not round. 


TOUTE MO MEME RO MU Meee 


A 
| 2 


Fig. 5—Relay with cover removed 
by jet electrolysis. 


CONCLUSION 


developed for examination of internal transistor 
and relay elements. The method is fast, inexpen- 
sive and adaptable to circular or irregular shaped © 
components and should aid those engaged in re- 
liability and quality control evaluations, especially 
those involving life tests. Although we have used 
this cutting method only on Kovar and brass cans, 
it should be adaptable to other metals without 
much difficulty. 


960 


IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 23 


On Prediction of System Behavior* 


JOAN R. ROSENBLATT t 


INTRODUCTION 


The subject of this paper is, in general, the 
roblem of putting together a prediction concern- 
ng the way in which a complex system will func- 
ion. Particular emphasis is given to a discussion 
f some of the types of definitions and choices 
vhich are made before such a prediction can be 
:ssembled. 

A theoretical discussion of prediction of system 
verformance deals with 1) a variable or a set of 
rariables by which system performance is to be 
ssessed, and 2) variables describing the parts of 
he system. The dependence of system perform- 
nce on properties of parts of the system is as- 
sumed to be given by a mathematical description 
f the relations among these two types of vari- 
bles. In the first part of this paper, a quite gen- 
‘ral formal statement of the problem is given. In 
he two further parts of the paper, illustrations 
re given suggesting some of the consequences of 
n explicit attempt to realize this formal approach. 

In the second part of the paper, some familiar 
imple mathematical models for prediction are 
tated as special cases of the general formulation. 
ome illustrations indicate the possibilities for 
enriching” these simple models. 

In the third part of the paper, some possible 
onsequences of explicit attention to problems of 
he definition of variables are developed, by means 
f simple illustrations involving the treatment of a 
aixture of two modes of failure. 


A GENERAL STATEMENT OF THE 
PREDICTION PROBLEM 


Consider first the set of variables representing 
operties of the system by which its performance 
assessed at time t. We may call this the sys- 
m variable (in general a vector variable) and 
note it by x,(t). Among the system properties 
hich might be represented in X(t) are the values 


Reprinted from Proceedings of the New York Univer- 
Bee ncatry Conference on Reliability Theory, 
rdsley-on-Hudson, N.Y., June 9-11, 1958. 

National Bureau of Standards, Washington, D.C. 


of system outputs (where appropriate), the age of 
the system in terms of hours of active use, and the 
number of surviving redundant components. In ad- 
dition, it may be appropriate for X(t) to repre- 
sent the presence or absence of attributes of the 
system: “operating” vs “failed,” “on” vs “off,” 
“in storage” vs “in use.” 

The value or probability distribution of X(t) 
determines the value of some “figure of merit” by 
which reliability is to be measured. 

Consider next a set of variables x, (t), X(t), 

a5 Knktos which represent factors determining 
system performance at time t. Environmental 
factors and human operators may be included 
among these; for convenience, they will be called 
subsystem variables. A subsystem variable repre- 
sents the properties of a subsystem which are suf- 
ficient to describe its effect on system perform- 
ance at time t. 

The specification of a set of subsystem variables 
involves, first, the choice of the number of sub- 
systems which is considered appropriate (or feasi- 
ble) for representation of system performance. To 
complete the specification of a set of subsystem 
variables, it is necessary to state how their values 
or joint probability distribution may be determined 
or estimated. 

Consider now the mathematical description of 
the dependence of system performance on proper- 
ties of the subsystems. This will, in general, have 
two aspects. First, the value of the system vari- 
able X(t) may be at least in part directly deter- 
mined by the values of some of the subsystem 
variables. This functional relation may be denoted 
formally by 


Ryser Ky At) iaeuce Kyi) Lp 


Second, the description of dependence includes a 
statement of the set of assumptions by which the 
probability distribution of xo(t) is determined. 
These assumptions specify properties of the joint 
distribution of the subsystem variables and the 
manner in which the values and distribution of the 
subsystem variables affect the distribution of 
X(t). 

Finally, consider briefly the role of the time 
variable t in this general statement. It is evident 


24 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


that in an operational prediction problem, this 
should be real time, with reference to a well- 
defined “zero” point which is the same for each 
subsystem. 

The variable t is included explicitly as an ar- 
gument in the relation represented by F, because 
the form of the relation may be different at differ- 
ent times (e. e.g., storage vs operating times). 

The explicit presence of t in connection with 
each of the subsystem variables calls attention to 
the possible presence (and representation) of 
sequential or cyclic time-phasing in the order of 
use of parts of the system. It may also be neces- 
sary or useful to account for “parts” of the sys- 
tem which are effectively present during some 
time intervals but not during others; for example, 
protective packing for storage, or the rocket itself 
in a rocket-launching system. 

Selection of appropriate variables and assump- 
tions provides the framework, the mathematical 
model, for prediction. Given this framework, the 
process of prediction consists of assembling in- 
formation about the subsystem variables, evalu- 
ating the system variable or its probability dis- 
tribution, and then calculating the values of some 
criteria of satisfactory performance, or figures 
of merit, which are functions of the system vari- 
able and its probability distribution. 

It is obvious, however, that the selection of the 
framework for prediction is an intrinsic and funda- 
mental part of the prediction process. It requires 
detailed specification or definition of each of the 
subsystem variables and of the system variable, 
and includes the choice of the form of the func- 
tional relation F and the choice of assumptions by 
which the distribution of xp(t) is determined. 

These definitions and choices will always be 
made, at least implicitly. The purpose of this 
paper is to examine some of the possible conse- 
quences and advantages of early explicit recogni- 
tion and investigation of the available choices. 


SIMPLE PREDICTION MODELS 


In this section of the paper, we note some of 
the types of choices which are made in using a 
model based on the “product rule.” Some obvious 
possibilities for enrichment of this simple ap- 
proach will emerge. 

The system variable is defined by 


1 if the system “is working” at time t, 


Xo(t) = 


0 otherwise. 


December 


The performance criterion of interest is the prob- 
ability that the system will survive through the 
time interval (0,t). Let 


One eile 


Po(t) = Pr[Xo(T) = 1, 


The subsystem variables are similarly defined: 


/ 


1 if the ith subsystem “is working” 


x, (t) a at time t, 


Q otherwise, 
lal 2 jase5 lhe 


The relation of x,(t) to the subsystem variables 

is specified by the assertion that the system “is 

working” if and only if each of the subsystems is 
Xp(t) = x, (t) - Xo(t) ... rt). 

Now it is assumed that each of the subsystem vari- 

ables is statistically independent of the others. 

Under this assumption, 


po(t) = p,(t) - Pot) .-- py (t), 
where 


p; (t) = Pr[X,(7) ae Ose Tae tls 
If the additional assumption is made that the sub- 
system lifetimes are exponentially distributed, we 
obtain . 
Po(t) = exp {-ry +Ag t+... + Ant}, 
where )j is the failure rate of the ith subsystem. _ 
By means of various kinds of modifications of 
the subsystem survival probabilities p;(t), this 
simple model based on the assumption of inde- 
pendence among subsystems may be generalized 
in various ways. Suppose, for example, that the 
system is to be stored for a period of time ts be- 
fore it is used. Letting t=0 at the beginning ‘of the 
storage period, we might have at time t > t 3) 


pit) =-ai(te) xa, (ft), 


where q;(t,) is the probability that the ith subsys- 
tem survives under storage conditions through the 
period (0,ts), and aT) is the (conditional) prob- 
ability that the ith subsystem survives through an 
operating interval of length T after surviving a 

storage period. In a more complicated represen- 


tation, the function qd ‘(7) could be made to depend 
on t,. 


960 


As another example, consider the possible 
resence in a system of an “initial use” subsys- 
2m, which is required to survive through the time 
iterval (0,t*) but is not needed thereafter. Thus, 


if the subsystem is working 
at time t;-or if t > t* 


1 
x;(t) = 


|] 0 otherwise; 


q, (t) denotes the survival probability of the sub- 
ystem, then 


t5<S tt 


¢ \ q; (t) if 


page att Sot 

Another approach to generalization may be 
1ade through the explicit representation of (ex- 
srnally caused) environmental stresses, in such 
way that the subsystems may be assumed to be 
onditionally independent, given the environmental 
onditions. One representation of this type [2] is 
1e following. Suppose there is a “stress level” 
hich may be either “critical” or “not critical,” 
uch that the system can work only if it is not 
ritical. Let 


1 if environmental stress is 
: iG not critical at time t 
ae. © / 
0. if it is critical, 
d 
| po(t) = Pr[X,(7) =1, O<T< tit 
Ow 


Xp(t) = XQ(t) - x, (t) ..~ xp(t) 


d, if we interpret p;(t) as the conditional survi- 
probability of the ith component when noncriti- 
1 conditions prevail, then 


Po(t) = Pelt) - Py(t) --- pylt). 


A second representation involving the occur- 
nce of environmental stresses is the following. 
ppose there were random (i.€.5 Poisson) oc- 

rences of “shocks” such that each occurrence 
oduces changes in the forms of the survival 
obability functions for some or all subsystems. 

r instance, there might be random occurrences 
a certain type of electrical transient, produced 
one subsystem and causing degradation of the 


ROSENBLATT: ON PREDICTION OF SYSTEM BEHAVIOR 25 


survival probabilities of several others. Suppose 
that the probability of continued survival of a sub- 
system at time t depended only on the total num- 
ber of “shocks” sustained, and the time elapsed 
since the most recent one. Thus, if k “shocks” 
have occurred at times T,, T9,..., T, (K=0,1, 
2,-+-+, 7 = 0), the (conditional) survival 
probability of the ith subsystem may be written 


pM) (®) = pig (Ty) iy (Te - Ty) - ++ Pixlt - Tips 


Oy S85 Gea Ty < t. The conditional 

survival probability for the system, given that k 
“shocks” have occurred in the interval (0,t), is 

then 


n 


pisses 0 


pi) (t) = le Pr 
O<tryaee<teteh 


p(k) (t) d7,...d Ty, 


and 


where 2 is the rate of occurrence of the “ shocks.” 
Whether or not p,(t) can be explicitly calculated 
depends on the forms of the conditional survival 
probability functions for subsystems. 

A product-rule formalism may have serious 
disadvantages when the specification of subsystems 
and time relationships is implicit and incomplete. 
The object of the foregoing discussion is to indi- 
cate how the approach may be enriched so that in- 
dependence is assumed where independence is 
plausible, or where a conditional independence 
mechanism is reasonable. 


CLASSIFICATION OF FAILURES 


In this section, we consider another aspect of 
model construction, namely, the impact of the 
classification of failures on the model form and 
model testing. The discussion is conducted in 
terms of a set of simple illustrative calculations. 

Consider an equipment for which the conditions 
of use are as follows. It is used regularly (daily, 
weekly, etc.), or essentially so, in an activity which 
we may call a “mission.” It is “turned on” at the 
beginning of the mission, is in use throughout the 
mission, and is “turned off” at the end. The dura- 
tions of the missions may or may not be essentially 
constant. An example of such an equipment would 
be an airborne electronic instrument. 


26 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


The question arises whether the lifetime of a 
part of such an equipment should be measured 
(and predicted) in terms of use—hours (total flying- 
time, say) or in terms of the number of uses (total 
number of missions flown). The time between 
missions will be ignored in this discussion. 

Suppose that the underlying life distribution for 
the equipment has the following form: 


Pr(survive a mission of duration t) = pe", 
where 
p = probability of surviving “turn-on” (in- 
cluding, e.g., electrical transients, 
operator errors) 
ert - probability of surviving to time t on 


condition that no turn-on failure occurs, 
where X is the (conditional) failure 
rate. 


Suppose certain life-testing data were available 
from acceptance inspection tests, for evaluating 
the “reliability” of the equipment. In particular, 
suppose there were a fixed time on test c, and 
that test results consisted only of the proportion 
surviving. These data could be interpreted in.ac- 
cord with either of the two limiting cases of the 
model stated above. For simplicity, sampling 
variation will be ignored, and the observed pro- 
portion surviving will be assumed to be equal to 
the survival probability. 


Number-of-Missions Interpretation 
It is assumed that there is a probability 
y = Pr(survive one mission), 


independent of the length of the mission. In fact, 
the observed proportion is 


Number-of-Hours Interpretation 


It is assumed that there is a constant failure 
rate u. In fact, the observed proportion is 


-uC _ - 
e7 Ll = perc | 


whence 


December 


We turn now to an examination of the conse- 
quences of employing one or the other of these two 
interpretations, in predicting the “reliability” of 
the equipment. 


Case 1—Missions of Constant Duration 5 


It is desired to predict the probability that the 
equipment will survive m missions. Thus, the 
quantity to be predicted is 
PE = (pe°)m 


Consider the two predictors corresponding to the 
two interpretations stated above: 


Bie = ym (Prediction 1) 
eee = e bmd (Prediction II) . 


Comparison of each of these with Py may be made 
in terms of the ratios 


P/Py = e7rlc-8) | 
Bel Bae pO-e)/c 


If the actual mission duration 6 is less than the 
test duration c, then Prediction I is pessimistic 
while Prediction II is optimistic. On the other 
hand, if c < 6, Prediction I is optimistic while 1 
is pessimistic. Specifically, e.g., 

BY PGs 122 if 2.5 <<; 
Prediction I is pessimistic in the sense that it un- — 
derstates the “true” survival probability. 

Now, if c = 5, both predictions are the same 
and correct. They are, in effect, "nonparauiaaa 
relative to this model. 

It may be remarked that the calculations for 
this case suggest that if the relative weights of the 
two modes of failure are unknown (both being pos- 
sibly present), then estimation based on c = 6 
(test time = mission duration) would be desirable. 
More generally, an investigation of a model repre- 
senting several possible modes of failure may lead 
to the selection of an experimental procedure and 
prediction technique which are not too sensitive to 
errors in assumptions about modes of failure. 


Case 2—Missions of Mean Duration 65 


Now suppose that missions are of variable durs 
tion, but that the duration D has a known probabil 
ty distribution with mean 6. It is desired to 


60 


redict the probability that the equipment will sur- 
ive m missions, i,e., the quantity 

Qir=eor (Een DA) Ot, 
here E is the expectation operator and ED = 6. 
or illustrative purposes, some comparisons of 
redictors will be calculated for the case where D 


as the exponential distribution. For this special 
ASE, 


mae wnses\ ss 
Q% , ae ‘ 


Employing the number-of-missions interpreta- 
on, one would again use Prediction I. Now 


P/Q, = (1+ rd5)e 


1 +2(5-c) + 0(a2). 


hus, in almost the same way as in Case 1, Pre- 
iction I is pessimistic if 6 < c. If 5 > ec and 
\-c) is sufficiently large relative to X, then 
rediction I is optimistic. 

When the number-of-hours interpretation is 
mployed, two approaches are possible. First, 
rediction II may be used as an approximation 
gnoring variability of mission duration). We 
ive 
: 
Py /Q) = (1+ nd)e%4 ,(5-c)/c | 


w (1+ rae? <1 so that Prediction II is 
ssimistic if 6 > c, again as in Case 1. In order 
at P,, provide an optimistic prediction for 

se 2, it would be necessary that c >> 6. 

A second approach within the number-of-hours 
terpretation is based on the experimenter’s 
owledge of the distribution of the variable mis- 

n duration D. We introduce a third predictor 
r the probability of surviving m missions: 


a = (Ee~HDym = (1 + 46)"™ (Prediction M1). 
nsider 
1 6 el 
a ore ee ta 
Bry = 5 (t+ cane? 5. 


is easily verified that 
Pos 1 -if -6 =, 


., that P_,, tends to be more optimistic than 


Il 


ROSENBLATT: ON PREDICTION OF SYSTEM BEHAVIOR 27 


P,, (which was noted to be generally pessimistic). 

Calculations for this simple model have illus- 
trated the possibilities for qualitatively appraising 
different biases of prediction in the event that there 
are two modes of failure with unknown relative 
importance. 

If the bias can be serious, it may be desirable 
to alter the form of the testing or experimental 
procedure used to obtain data. 

A recent paper by Stoller [3] treats some sta- 
tistical issues which would arise in considering 
the particular model used as an illustration in this 
section. 


CONCLUSION 


In the context of a general representation of 
models of system performance, attention has been 
drawn to some basic methodological considera- 
tions which arise in the composition of prediction 
models for complex systems. In particular, we 
have examined the consequence of certain types of 
model or definition choices: first, choices con- 
cerning the relative independence or interdepend- 
ence of parts in the elaboration of a simple model; 
second, choices concerning the classification of 
modes of failure. These are basic elementary 
types of choices, and some aspects have been 
treated explicitly, as they must be treated in an 
operational approach to prediction problems for 
specific complex assemblies. Such choices are 
explicitly realized in the form of the data collec- 
tion or experimental program (including possibly 
some simulation experiments). 

An alternative statement of the methodological 
point of this paper may be made from the stand- 
point of the reliability statistician or engineer. 
The relevance and meaningfulness of reliability 
predictions arein large measure already deter- 
mined by the form and fine structure of the sub- 
system or component data. In the context of an 
explicit model, we may expose and examine the 
definite methodological choices available to the 
reliability statistician or engineer for controlling 
the form of these data. 


REFERENCES 


Many persons have discussed particular aspects 
of the methodological issues considered in this 
paper. An illustrative reference is the section, 
“Measurement of Time for Reliability Evaluation” 
(Chapter III, Section 3.1, page 21 of [1]). An ex- 
haustive list of references has not been compiled. 


28 _ IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL Decembe 


[1] Aeronautical Radio, Inc., “Concepts and Tentative - ance from information on component performance, 
Techniques for Reliability Assurance,” Progress Proc. WJCC, pp. 85-94; February, 1957. 
Rept. No. 1, Air Force Reliability Assurance Pro- ; by 


gram; February 15, 1956. [3] D.S. Stoller, “A failure model for equipments unde 
going complex operation,” Operations Res., vol. 6, 
[2] J.R. Rosenblatt, “On prediction of system perform- pp. 723-728; 1958. ; 
as =* 
ae poh 
Ts 
: — ad bed oe ‘ o> dori ose Le MS ~ 
<a 8 | p . iicie Nees et 442 ‘ x ee = pp ARE Ke co et a be 
a2 So Qe oP ey 
? sien : : “e % =e “AK; i ee : sie cigs: BIE 
= . ee ia: : ; ‘ 2 : e- . +o ee ger tek ee * ® oe ws ed 
is en ick el eee SER TSOARS ASs ehh eed ¢ oe era yaa re Ao : 
SP Ie OY Oss OFS ok Bewlees sitaiiast, 2 oy Fon a 4 ; Ses ¢ Bes Ages 
-_ < ean ait Mere rvEC Se rele bag te ee sywiet a) ; Tt ¢ ag Pi ie ate 
ao cay oe 3 Cale a * re M a ‘Ae aig rey: Figs. bs . He pitt ox ~ psy ie . a - sti si sin rf 
ate oo ; .* eA sh ea ed. 
san 1S ta Resides ots Sy. doce tets 375 ai WEE 25 Jie aes a 


> ANN 8.1 Ao aeand bees NEY ley siebsg 

Dar eee Lavy dbase EBS gat Hy St om ae? 3 
mas ii mieiienetadi gt mer 5 

i Seen Petal — 


Ay 


160 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 29 


An Application of the Information Theory 


Approach to Failure Diagnosis* 


Kado KLETSKY t, 


Summary—Brule, Johnson, and Kletsky! have 
sveloped a technique based on information theory 
hich leads to highly efficient procedures for diag- 
sing equipment failures. This paper demon- 
rates by means of a practical example the valid- 
y Of this technique. In addition, the feasibility of 
i€ approach is shown and the procedure to be used 
| its implementation is outlined in detail. The 
per concludes with a general discussion includ- 
ig COmments On the generality of the technique, 
i€ possibility of machine computation, and possi- 
e areas of application. 


I. INTRODUCTION 


As the complexity of newly-developed high- 
srformance systems continues to grow, the asso- 
ated problem of maintaining and repairing these 
7stems becomes increasingly important. De- 
ands for higher reliability, longer life, and 
1orter periods of down time are factors which 
rk to increase the severity of the maintenance 
oblem. 

The problem has been attacked on several 

nts. Manufacturers continue to improve the 
liability of component parts, preventive main- 
ance procedures tend to reduce over-all down 
e, and careful mechanical and electrical design 
plify repairs. 

One area of the over-all maintenance problem 
ich has been neglected until recently is that of 
lt location. Yet, field experience indicates that 
s area accounts for as much as a third of the 

al down time on the equipment. It appears 
asonable therefore that an examination of the 
sic fundamentals of diagnostic procedures could 
bstantially reduce this significant portion of 

n time. 


his work was partially supported by Rome Air De- 
opment Center under Contract No. AF 30(602)-1833. 
lec. Engrg. Dept., Syracuse University, Syracuse, 


), Brulé, R.A. Johnson, and E.J. Kletsky, “Diagnosis 
equipment failures,” IRE Trans. on Reliability and 
ality Control, vol. RQC-9, pp. 23-34; April, 1960. 


MEMBER, IRE 


Brulé, Johnson, and Kletsky have studied the 
problem of equipment diagnosis and have developed 
a technique based on information theory which 
leads, in theory at least, to highly efficient diag- 
nostic procedures.*»“ The main purpose of the work 
to follow is to demonstrate the validity of this 
technique. To this end, a diagnostic procedure for 
a relatively simple communications receiver has 
been developed using the proposed technique. For 
receivers of this general type, we know what an 
efficient diagnostic procedure should look like as a 
result of studying the trouble-shooting methods em- 
ployed by highly competent technicians with wide 
experience in receiver repair. The results show 
that the information theory technique yields a diag- 
nostic procedure which is not essentially different 
from that which would be used by a competent 
technician. 

As a result of the demonstrated usefulness of 
the technique, we suggest in Section VII that it will 
also prove useful in the development of self- 
monitoring machines, in the design of new systems, 
and in helping alleviate the maintenance problem 
associated with existing systems. 


Il. TECHNIQUE 


The information theory approach can be con- 
sidered as a formalization of the trouble-shooting 
techniques employed by an expert technician. In 
diagnosing an equipment failure, the technician has 
available a large number of tests. He is able to 
choose from these a set which is sufficient to diag- 
nose the equipment. To each of these tests he as- 
signs a cost which may be dependent on those tests 
previously performed. When a test is performed, 
the technician learns something about the equip- 
ment; that is, each test removes a certain amount 
of ambiguity concerning the location of the fault. 

The problem is to find a sequence of tests which 
will diagnose the equipment in an efficient fashion, 
that is, at minimum cost. J ohnson2 has shown that 


2R.A, Johnson, “An information theory approach to diag- 


nosis,” Proc, Sixth Natl. Symp. on Reliability and Quality 
Control, Washington, D.C.; January 11-13, 1960. 


30 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


a highly efficient sequence of tests can be assured 
by choosing the tests on the basis of a figure of 
merit derived from information theory. 

The figure of merit to be used is the ratio of 
ambiguity removed by a test to the cost of per- 
forming the test. That is 


KK, 
Cy 


-P logg P - (1-P)logg(1-P) 
vai eS Sean G; He 


Ee ms 


where P is the a priori probability that the test 
will pass and C, is the cost of performing the 
test. An efficient sequential testing diagram can 
be constructed using this figure of merit accord- 
ing to the following schedule: 

1) Evaluate F, for each of the possible tests. 

2) Choose the test with the highest F,,. 

3) Alter the cost of performing the remaining 
tests on the basis of having performed this 
and other tests previously. 

4) Alter the a priori probability of passing for 
each of the remaining tests on the basis of 
knowledge gained by having performed this 
and other tests previously. 

5) Repeat the procedure until the entire se- 
quential diagram is determined. 


Hl, APPLICATION 

In order to demonstrate the use of the above 
schedule, a standard Air Force communication 
receiver (R-278B/GR) has been analyzed and a 
diagnostic procedure prepared. The receiver can 
be represented by the elementary equipment dia- 
gram shown in Fig. 1. This diagram represents 
the interactions between the power supply, 
mechanical tuning system, and signal circuits 
which are necessary to provide a useful output. 
Also shown are the required primary stimuli 
(antenna signal, primary power, mechanical tuning). 
The assumption has been made that the mechanical 
tuning system is in operating condition and, hence, 
failures can only occur either in the power supply 


} 
Signal 
Circuits 
Mechanical 
System 


Fig. 1—Elementary equipment diagram. 


Antenna 
Signal 


Output 


Primar 
Power 


Mechanical 
Tuning 


time is a function of many parameters. The most 
important of these are listed below: 


It is seen that the cost of performing a test is de- 


cedure must consider this point. 


operations,” each of which is assigned a cost. TI 
test cost is the sum of the individual unit opera-_ 
tion costs. Unit operations can be either of fixed 


December 


or signal circuits. It is further assumed that in- 
terconnecting cables are in operating condition. It 
must be pointed out however, that these assump- 
tions have been made only in the interest of sim- 
plicity and that the procedure is inherently capable 
of handling the unrestricted problem as well. 


Power Supply Analysis 


Study of the receiver circuit diagram allows 
construction of an equipment diagram for the power 
supply, as shown in Fig. 2. Each box represents a 
functional element to which a failure may be at- 


tributed. Lines entering a functional element 


represent input stimuli (electrical, mechanical, or 
other) which must be present before the element 
is capable of providing an output. An output will 
be found if and only if the necessary stimuli are 
present and the functional element is good. 

The equipment diagram provides the means by 


which an exhaustive list of possible tests can be 
formed. A test may consist of supplying all the 
necessary stimuli to an element and observing the 


response of this element. (For example, Rectifier 
1 can be tested by supplying 115 volts ac and ob- 
serving, as a response, the presence of the “raw 
B+” signal.) Or, tests may consist of supplying 
stimuli at the input of a group (cascade and/or 


parallel) of elements and observing a single output 


response, In general, the number of possible tests 


is exceedingly large. It is desirable to be able to 


reduce this number to more manageable propor- 


tions. This can be done by considering the cost 


associated with performing a test. 
Test Cost Analysis 


The cost of performing a test at a particular 


1) Test equipment required. 

2) Present state of disassembly of equipment 
under test. : 

3) Additional disassembly of equipment re- 
quired. 

4) Cost of supplying external signals. 

5) Cost of actual test performance. 


pendent on what tests have already been performet 
Any method used to find an efficient testing pro- — 


Each test consists of a sequence or set of “unit 


cost or of variable (decreasing) cost. Once a test — 


960 
Fi lament E 
Supply 1 


Rectifier 


2 


Rectifie 
Raw 


+ 


Manual Z 
Selector Switc 


3 115v AC 


EA 
>rimary Switch 
15v AC 


Power 


Blower 


7 
C Rectifie 
3 Raw 
Bias 


Blower 
Air 
Fi lament 
Supply 
2 


KLETSKY: THE INFORMATION THEORY APPROACH TO FAILURE DIAGNOSIS 31 


Fi lament 
Supply 1 


Ee 


y, 
Muting Relay 
Control 
Muting N | Filter L 
Relay 5 200v DC 


Sie. K 230v DC 
i r 
an 230v DC aareh 
before muting eS 
150v C 


Regulator |150v DC 


ricer 
3 5v DC 


Fi lament 
Supply 
2 


Fig. 2—Power supply equipment diagram. 


s selected, those operations of variable cost as- 
ociated with the test are appropriately modified. 
lence, subsequent tests requiring unit operations 
hich have been previously used will, in general, 
ave lower costs. 

Unfortunately, data do not exist which allow an 
xact determination of unit operation costs. 
reasonable estimates have been made based on 
xamination of the equipment under test, availa- 
ility of test equipment, time required to actually 
erform the test in the laboratory, etc. These 
stimates have been reduced to equivalent costs 
xpressed in man-hours. The number of digits 
arried in the test cost reflects the wide range of 
osts associated with the unit operations. It is 
oped that eventually, better time and cost studies 
ill have been made which will allow more pre- 
ise cost estimates. 

Using the cost analysis as a guide, a subset of 
1e large number of available tests can be chosen 
s reasonable tests to perform. This group of 
‘sts is shown in Table I, along with the initial 
ost of performing the test. Each test is desig- 
ated by an N-digit binary number where N is the 
umber of functional elements. (N = 13 for the 
wer supply.) The test number carries a “zero” 


in every position corresponding to an element which 
must be “good” in order that the test pass. In gen- 
eral, the test designation asks the question: “ Are 
all of the elements i, j,..., k good?”, where i, 
j, ...k specify those elements which have a zero 
in the test designation. The binary test designa- 
tions are easily determined from the equipment 
diagram. The alphabetic symbols for signals in 
Table I refer to those shown in Fig. 2. 

For reference, a detailed description of a se- 
lected set of these tests is shown in the Appendix. 


Probability Analysis 


In addition to a suitable subset of tests, the ap- 
plication of the information theory technique re- 
quires that the a priori probability of failure of 
each functional element be known. An estimate of 
these probabilities can be made using the raw data 
from RCA investigations? of the R-278B/GR Re- 
ceiver. Table II shows part quantities and cor- 
responding failure rates for each of the functional 
elements of the power supply. (Failure rates 


3«4 Prediction of AN/GRC-27 Reliability,” RCA Serv- 
ice Co., Inc., RADC-TN-58-18. 


December © 


32 


: 


IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


TABLE I 


POWER SUPPLY TESTS 


Sujsseq jo 
AyTIqeqoig Ten yur 


Te) 
fer) 
lop) 


= ee —w 
(oo) i ~ co ite) i=) aD D> S oO oO nN NX S S 
(ee) a fer) tH sH or) oO aD ice) for] st for) 00 for) a sH st 
for) for) fon) Je) for) o>) a or) for) for) for) or) for) ~~ ~ 
. ° . ° . e e 


.633 
725 
622 
.738 
731m 
461 
451 
A401 
733 


3809 
SOL TRH 


rao} 
oS 


0. 
il 
is 


pertnbey sTeusts | < 


Loye[NSoy I[OA-OST | 4 


Avpoy Supyny | 4 


€ 20a | 4 
Z reyta | 
T t0%Td | 4 
€ JelWooy | 4 
@ JeyMooy | 
T teyHooy | 4 
G JUSOWSIT | 4 


T quewelTy | a 


Teusig JowoL (Mm CO Mm & OU MB & M A BR Be pee Oe Me eee er ees 


“ON 389. | 7 N oO ~4 mM Oo -& foe) for) 


10 
ilat 
12 
13 
14 
15 
16 
iby? 
18 
19 
20 
21 
22 
23 
24 


1960 


KLETSKY: THE INFORMATION THEORY APPROACH TO FAILURE DIAGNOSIS a3 


TABLE II 


COMPUTATION OF FAILURE PROBABILITIES FOR POWER SUPPLY 


Filament No. 2 
Rectifier No. 1 
Rectifier No. 2 


yapacitors No. 
Total Rate 


[Sy Filament No. 1 


Relays No. 
Total Rate 


soils No. 
Total Rate 


Resistors No. 
Total Rate 


Mae 
or 


witches No. 
Total Rate 


"ransformers No. 
Total Rate 


iam 


“ubes No. 
“otal Rate 


fer) 
ow 


. ‘ 
leaters No. 
‘otal Rate 


slower No. 
‘otal Rate 


‘otal Failure 
Rate 


-robability of 


— 
us 
Oo 


shown are 1000 times the failure rate expressed 
n per cent/1000 hours.) The a priori probability 
of failure of a given functional element is found by 
lividing its failure rate by the failure rate of the 
sntire power supply. The procedure used here is 
ot unlike that used by Ryerson and others in es- 
imating failure probabilities.4 

Using the derived a priori probabilities of fail- 
ire, it is possible to compute the probability with 
hich each of the tests in Table I will pass. This 
s done by summing the nonzero probabilities of 


C.M. Ryerson, “RCA Reliability Program and Long 
ange Objectives,” RCA, Camden, N.J.; March 15, 1955. 


# a 


ees, 
Failure 0.005 | 0.012 | 0.003 | 0.003 | 0.255 | 0.255 | 0.362 | 0.015 | 0.010 |0.011 |0.007 | 0.002 | 0.060 


Filter No. 2 
a 
150-volt Regulator 


pel Se) Filter No. 1 
ow 
ow 
oS 
i 


175 


‘60 


the test over all the elements not known to be good 
in the previous state. (For example, P16 = 02012 
+ 0.003 + 0.003 + 0.362 + 0.010 + 0.011 =" 0.401.) 
The result of this operation is shown in the appro- 
priate column of Table I. 


Formation of the Testing Diagram 


All the data required for the application of the 
figure of merit, F,, have now been compiled. The 
figure of merit for each test can be computed ina 
straightforward manner. It should be emphasized 
that many tests can be thrown out prior to calcula- 
tion by noting that high cost and/or low AA, lead 


34 


to low F,. The test with the highest F), is chosen 
as the first test in the testing diagram. From this 
test the two following states are constructed. The 
manner in which these states are determined is 
best illustrated by reference to Fig. 3. 


Next upper State 


Current State 
10100 


1101‘. 
\ 
\ 
XN 


10110 


Test 


Designation Next lower (failed) 
state 


00010 


State 
Designation 


Fig. 3—Determination of states following a test. 


S;,_1 represents the state of the equipment 
prior to performing the test T,. This state is 
given by an N-digit number containing only the 
digits 1,0, and0. There is a 1 in each position 
corresponding to elements not yet tested. There 
is a 0 in each position corresponding to elements 
known to be good. There is a 0 in each position 
corresponding to elements inferred to be good on 
the basis of a previously failed test. In the initial 
state, there are 1’s in all positions since none of 
the elements have been tested. 

S, represents the state of the equipment if test 
T, passes. This state is computed by multiplying 
Si4 and T), digit by digit without carry. Sk 
represents the state of the equipment if test T 
fails. It is computed by multiplying Si_ 1 and the 
complement of T), digit by digit without carry. 
The 0’s in Ty, complement must be replaced by 
0’s in order to prevent improper diagnosis if more 
than one element has failed. 

Following the selection of a test, the cost of 
performing the succeeding tests must be modified 
on the basis of the test or tests already performed. 
This is done by altering the cost of unit operations 
which are common to unit operations used in pre- 
viously performed tests. (For example, when 
Test T;g has been performed, the subsequent per- 
formance of Test To, has a cost of only 0.020 since 
unit operations 26 and 28 result in no additional 
cost.) 

Once a test is chosen and performed, the rela- 
tive probabilities of element failure change. It is 
thus necessary to recompute the probability with 
which each of the possible succeeding tests will 


IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


December 


pass. As in the initial state, the new probabilities 
are found by summing the nonzero probabilities of 
the test over all the elements not known to be good 
in the previous state. 

Additional tests in the testing diagram are de- 
termined by repeated application of the above pro- 
cedures. If any state contains a single 1, that par- 
ticular branch of the diagram is completed, since 
the element corresponding to the 1 has been iso- 
lated. The testing diagram is complete when 
every branch terminates in a state containing a 
single 1. 

The resulting testing diagram for the power sup- 
ply is shown in Fig. 4. Also shown are the costs 
of performing the tests and the a priori probability 
of failure of each functional element. Note that it 
is also possible to compute the average cost of lo- 
cating a fault in the power supply. This is given 
by 


N 
average cost = C = aS. £ iP; + C.D. 
j=l 


Pj = initial a priori probability of failure of jth 
element 

£; = sum of test costs required to reach jth 
element 

N = number of elements 

C,. = cost of verifying test 

Py, = probability that verifying test must be 


performed. 


The verifying test must be used to determine un- 
equivocally the state of the element isolated at the 
extreme end of the upper branch in the testing di- 
agram. This element has been isolated only by 
inference. The verifying test to be used is the 
cheapest one which explicitly tests this final ele- © 


$ 
d 


ment. 

The testing diagram shown in Fig. 4, in con- 
junction with the test descriptions of the type fo 
in the Appendix, is sufficient to perform a diag- 
nosis of the power supply. 

Exactly the same procedure is used to prepare 
a testing diagram for the signal circuits. The 
data concerning tests, probabilities, and costs 
must be manipulated in a similar fashion. The 
resulting testing diagram is shown in Fig. 5. 


IV. DIAGNOSTIC CHART 


The information given by the two testing dia- 
grams developed for the receiver can be easily 
displayed in an extremely useful chart suitable fo 


960 


KLETSKY: THE INFORMATION THEORY APPROACH TO FAILURE DIAGNOSIS 35 


Verify with T46 
Cost = .020 


(A-Priori Probability) 


N 
pf4.+¢p¢ = Average Cost = C 
me Jd 0 “kk ©“ Tol2 x .0201 = 0002 
Sy +005 x .0201 
-Oll x .0351 
+362 x .0351 
-003 x .0202 
-003 x .0203 
+255 x 0405 
+015 x .O403 
+255 x .0403 
-060 x .0553 
-OO7 x .0703 
+010 x .0903 = .00090 
+002 x .0903 = .00018 


PP eee at 
a 
oO 


03961 
Verifying Test: .060x .(20 = .0012 
C = .O# hrs 


Fig. 4— Power supply testing diagram. 


ise by relatively inexperienced technicians. This 
liagnostic chart is a unique written description of 
he testing diagram in which the diagram is re- 
laced by a step-by-step procedure. ° The testing 
rocedure to be followed at each step (test) is out- 
ined in detail. The result of each test leads to a 
inary (Yes-No) result. On the basis of the re- 
ult, the chart indicates the next step to be fol- 
owed or the fault which has been isolated. Hence 
iagnosis becomes an efficient predetermined 
outine even for the inexperienced technician. 


V. MACHINE COMPUTATION 


Earlier sections have demonstrated that the in- 
yrmation theory approach to diagnosis leads to 
fficient and practical fault-locating procedures. 


E.J. Kletsky, “Diagnosis of Equipment Failures—Part 
” Rome Air Dev. Center, Griffis AFB, N.Y., Tech. 
ept. No. 60-67B; April, 1960. 


The individual computations involved in the prep- 
aration of these procedures are simple and of a 
highly repetitive character. However, for large 
problems, hand calculation becomes exceedingly 
laborious and time-consuming. For this reason, 
a sample digital computer program has been pre- 
pared for the calculation of diagnostic procedures. 

The program was written specifically for the 
basic IBM-650 magnetic drum computer with a 
2000-word drum and no external tape storage. The 
basic logic of this program represents one possi- 
ble computing routine and can be used as starting 
point in writing programs for operation on larger 
machines and for solution of more complex prob- 
lems. 

The computer program is able to handle the en- 
tire computation provided that the input data fall 
within the following limits: 

1) The number of elements, N, is 0< N< 10. 

2) The number of tests, T,, less than 100. 

3) The number of unit operations with fixed 

cost is 0 < Cy< 25. 


36 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


December 


Pes erly with T1537 
ig Cost = -385 


7206) X50 N= OSLO 
.052 x .150 = .0078 
-120 x .425 = .O0DLO 
20755-0951 =" -O5en 
.083 x .695 = 0577 
-174-x_.525 = .O914 
134 x .910 = 1219 
.094 x 1.215 =.1142 
.060 x 1.215 =.0729 
.6002 
-134 x .385 = .0516 verifying 
test 
€ £0.65 hours 
k 
* 
Fig. 5—Signal circuits testing diagram. i 


4) The number of unit operations with variable 
cost is 0 < Cy < 25. 

5) Each variable cost unit operation can take 
on an average of no more than 3 different 
values. 

6) The maximum number of unit operations per 
test is 10. 

This program has been used to compute the 
testing diagram for the signal circuits of the 
R-278B/GR receiver. The entire computation is 
carried out in approximately 4 minutes. This 


compares with about 3 hours for hand computation. 


It is appreciated that in most practical cases, 
the number of tests may well exceed 100 and the 
number of elements will certainly exceed 10. 
Under these conditions, a machine with additional 
storage (tape, cards) becomes necessary. How- 
ever, the logical structure of the program re- 
mains fixed and hence expansion for larger prob- 
lems is relatively easy. 


VI. GENERAL COMMENTS 


Data Accumulation 


Before the methods described can be applied in 
practice, cost data and probability data must be 
available. The accumulation of these data repre- — 
sents the most serious difficulty encountered. 

The acquisition of failure probability data has 
been made somewhat easier by recent investiga- 
tions of the RCA Service Company and by the 
published reliability data made available by com- 
ponent manufacturers. Use of these data, coupled 
with engineering experience, allows reasonable 
estimates to be made of the required functional 
element failure probabilities. Very accurate fail- 
ure probabilities are not required, since the in- — 
formation gained from a test is relatively insensi 
tive to the failure probabilities of the individual 
elements. This is particularly true for those tests 


960 


vhich yield a high information gain which are, 
herefore, selected by the information theory fig- 
ire of merit. 

Accumulation of cost data is more difficult. 
sittle information is available concerning the cost 
issociated with performing unit operations which 
(oO to make up the cost of a test. At present, ex- 
mination of these operations in the laboratory, 
-oupled with good engineering intuition, provides 
he best estimate of their costs. It is unfortunate 
hat more precise determinations have not been 
nade. This is particularly true in light of the 
act that the figure of merit, Fx, is considerably 
nore sensitive to variations of cost than it is to 
‘ariations of probabilities. For this reason, con- 
siderable care should be given to the estimation of 
mit operation costs. In the development of new 
ystems, reasonably accurate estimates of these 
osts can be made by personnel responsible for 
naintenance aspects of the design. 


senerality of Procedure 


The maintenance of complex electronic systems 
s usually carried out at several levels. These 
evels depend primarily on the location of the sys- 
em and complicated operational and logistical 
roblems involving allowable equipment down time, 
Vailability of test facilities and equipment, re- 
lacement inventory, manpower, and cost. The 
olution to the failure diagnosis problem is some- 
yhat different for each of these levels. However, 
he method by which this solution is obtained re- 
nains unchanged. An example will clarify this. 
A typical aircraft fire-control system shown in 
fig. 6 might include a tracking radar, a computer, 
nd a communication link with the ground based 
ombat control center. Each of these subsystems 
s in turn comprised of one or more self-contained 
black boxes.” The black boxes perhaps contain 
emovable subchassis, and these in turn contain 
e basic components (tubes, resistors, etc.) which 
ake up the entire system. 
At the first level of maintenance, the crew chief 
quires an efficient procedure which tells him 
hich of the subsystems has failed so that the ap- 
opriate second-level maintenance procedure 
ay be initiated. The first level usually consists 
a sequence of tests which are performed from 
uipment operating positions and requiring very 
ttle external test equipment. The function of 
cond-level maintenance is to locate and replace 
r perhaps, repair) faulty black-boxes., This re- 
ires an inventory of spares but only a minimum 
tet equipment, tools, andtime. Third- and 
urth-level maintenance is usually performed at 


KLETSKY: THE INFORMATION THEORY APPROACH TO FAILURE DIAGNOSIS Sif 


Maintenance 


Level 


Ses Communication Tracking 
Computer 
Set ie Vel Radar ¥ 1 


Ground 


| 
| 
1 
Control | 
| 
| 


Center 


Power 


rn 


ii) 


Supply 


ie 


RF Amp Mixer I 


Tubes 
Resistors 
Capacitors 
Coils 

Etc. 


Fig. 6—Maintenance levels for hypothetical 
fire-control system. 


a later time employing more elaborate test equip- 
ment and, in general, requiring large inventories 
of replacement parts. 

Note that at each level of maintenance the prob- 
lem is to efficiently locate the faulty functional ele- 
ment. It is thus seen that the fundamental differ- 
ence between various maintenance levels lies in 
the definition of the functional units. These may 
vary in size from as large as an entire subsystem 
(radar set) to as small as a basic component (re- 
sistor or tube). The information theory approach 
leads to efficient diagnostic procedures for func- 
tional elements of any size. 


Effect of Symptomatic Information 


The sample diagnostic procedure developed for 
the R-278B/GR receiver assumed that there was 
no symptomatic information concerning the cause 
of failure. In general, some type of symptomatic 
information is always available. The operating log 
of the equipment indicates events prior to the fail- 
ure. Discussions with the equipment operator lead 
to an exposition of abnormal conditions prior to 
and during the failure. Attempts by the operator 
to circumvent the failure also provide symptomatic 
information. A cursory visual inspection indicating 


38 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


damage due to rough handling; an aural inspection 
indicating abnormal noises; or overheated areas 
are also sources of symptomatic information. 

All of this symptomatic information can be in- 
corporated in the development of efficient diag- 
nostic procedures by noting that these symptoms 
effectivity alter the a priori probabilities of failure 
of the functional elements. For example, physical 
damage to the equipment case would increase the 
a priori probability of failure of elements such as 
vacuum tubes, crystals and sharply tuned circuits. 

Ideally, each symptom or set of symptoms re- 
sults in a different diagnostic procedure. Thus 
the most efficient procedure will be dictated by 
what symptomatic information is available. Note 
that absence of such information does not invali- 
date the general method but merely leads to pro- 
cedures which may be less efficient. 


Vil. AREAS OF APPLICATION 


Monitored Systems 


The operating parameters, inputs, and outputs 
of many large systems are continuously monitored 
to provide indications of system performance. 
When one or more monitored signals fall outside 
of the prescribed limits, an alarm is set indicating 
a system malfunction. Interpretation of the group 
of out-of-tolerence signals is equivalent to per- 
forming first level maintenance. This information 
can be used to initiate one of a set of prepro- 
grammed diagnostic procedures designed for 
second-level maintenance. Each of these prepro- 
grammed procedures can be made efficient through 
the use of the information theory approach. This 
suggests the possibility of integrating the diag- 
nostic procedures developed here with those of 
self-monitoring machines to develop systems with 
a self-diagnosing capability. 


Check-Out Procedures 


A necessary part of the maintenance problem is 
that of determining when the system is working. 
This problem occurs either following a repair ac- 
tion or prior to an operational mission. A check- 
out procedure is desired in these cases. 

The diagnostic procedure developed for the 
equipment can be used as a check-out procedure 
by performing every test which lies in the upper 


December 


branch of the testing diagram. If all of these tests 
pass, the equipment is in operating order. 

Any of the diagnostic procedures determined by 
the information theory method can be used as a 
check-out procedure by performing those tests ap- 
pearing in the upper branch. The optimum check- 
out procedure is that for which the total cost of 
performing the tests in the upper branch is a mini- 
mum. The testing diagram corresponding to opti- 
mum check-out is not necessarily the same as that 
diagram corresponding to optimum diagnosis since 
different criteria of optimum are used. 


New Systems 


Another area of applicability is that of new sys- 
tem design. The detailed operating characteristics, 
packaging proposals, and maintenance require- 
ments of new and projected systems are obviously 
familiar to the personnel engaged in their design. 
It appears reasonable for design engineers to give 
strong consideration to provisions for efficient 
failure diagnosis during the postexperimental and 
prepackaging phases of new system designs. It is 
hoped that the techniques and concepts developed 
here will provide some new concepts of value to 
those design engineers concerned with the general 
problem of failure diagnosis. 


Existing Systems 


The techniques described can also be applied to 
improve the maintenance of existing operational 
systems. The problem of retaining military tech-_ 
nicians over periods of time long enough for them — 
to become efficient trouble-shooters is well known, 
As systems increase in complexity, this prone 
will become more acute. 

A proposed solution is to provide each system _ 
with a set of diagnostic charts prepared inac- 
cordance with the methods described in this report 
Any technician, with minimal fundamental elec- 
tronic system schooling, who is able to read these 
charts and manipulate test equipment, should now 
be able to diagnosis the system nearly as well as a 
maintenance man with a great deal of experience 
with the particular system. Furthermore, the 
technician is not limited to maintaining a particull 
system, but is able to trouble-shoot any system fc fol 
which a proper diagnostic chart has been prepared 
The saving in training time and consequent versa- 
tility of the technicians are easily appreciated. 


160 IRE TRANSACTIONS ON RELIABILITY AND QUALITY CONTROL 


APPENDIX 


39 


The following is a selected sampling from among 64 tests used in preparing the testing diagrams 


1owWNn in the text. 


UNIT OPERATIONS 


0.0001 
0.0000 


Be Supplied 


LOO’ INUVe 
Signals to 


ns 
iz er Q | Test of 
Signal 
ae 
a 
£O OO 


Connect ac power and turn power switch ON 
Listen for blower 
Is blower operating ? 


Remove dust cover 
Connect ac power and turn power switch ON 
Multimeter measurement at J-903 

Does meter read between +100 and +125 volts dc? 


Remove dust cover 


O1 Disconnect J-1206 

28 Connect ac power and turn power switch ON 

07 Multimeter measurement at pin 1 of J-1206 

03 Change position of channel selector switch 
Does meter read zero when motor is running and 
between +210 and +250 volts de when motor 
stops? 

6 A,Z 26 Remove dust cover 

28 Connect ac power and turn power switch ON 

O01 Remove P-601 

07 Multimeter measurement at pin 12 of J-1206 

03 Change position of channel selector switch 


Does meter read zero when motor is running and 
between +148 and +152 volts dc when motor 
stops? 


Te AT Supply 2.05-Mc signal at J-603 0.250 
10 VTVM measurement at AVC Jack. J-1218 0.020 
Does meter begin to rise from noise level at 
about 0.5-volt signal and continue to rise to 
about - 10 volts de for 1.5-volt signal? 
VTVM measurement at J-402 0.020 
Run thru each of the 18 positions of the 10-Mc manual 
channel selector switch (from 39 to 22) 0.055 


Does meter read between -0.8 volt and -2 volts 
de at each position? 


None 10 
04 


Connect 600 ohms across output terminals (J-1213) 
Remove P-402 
Supply 165-uv 9-Mc signal at J-505 
VTVM measurement at J-1213 
Run thru all 10 positions of the 0.1-Mc manual channel 
selector switch (from 0.9 to 0.0) 

Does meter read between 40 volts rms and 60 volts 
rms at each position? 


Total 
Initial 
Cost 


0.0001 


0.0201 


0.0301 


0.0301 


0.270 


0.075 


0.405 


BACK ISSUES OF RELIABILITY PROCEEDINGS NOW AVAILABLE 


Out of stock editions of the Proceedings of the National Symposium on 
Reliability and Quality Control have recently been reprinted so that the 
Proceedings of all six symposia are now available from IRE Headquarters. 


The six Proceedings contain, respectively, the papers presented at the 
First National Symposium on Reliability and Quality Control held in New 
York City in November, 1954, and the second through sixth symposia held 
in Washington, D.C., each January from 1956 through 1960. 


Copies may be ordered at $5.00 each from the Institute of Radio Engi- 
neers, 1 East 79 Street, New York 21, N.Y. 


40 


a 


X 


Le 


ea 
i 
S ¥ 
: 
ty 
| ; 
a ~ hs 
‘ . 
? ws ~ 
$ ~ Tk 
J 
a 5 
==" 
fol —_ 
= 
vo 
~~ 


